Duration:
6/29/2021 10:45 am
Group Responsible:
IT Fabric
Affected Area:
CDCE
Expected Impact:
Approximately 250 nodes in the shared pool (spool0XYZ) and the ATLAS T1 (acas0XYZ) shut themselves down automatically.
Maintenance Type:
Unplanned/Outage
Description:
The CDCE room in our datacenter suffered another cooling failure around 10:30 AM this morning (6/29). Approximately 250 nodes in the shared pool (spool0XYZ) and the ATLAS T1 (acas0XYZ) shut themselves down automatically to avoid equipment damage at 10:45 AM. F&O is working on the problem, and we are continuing to monitor if additional compute node shutdowns are needed to shed heat load.