Duration:
6/29/2021 10:45 am

Group Responsible:
IT Fabric

Affected Area:
CDCE

Expected Impact:
Approximately 250 nodes in the shared pool (spool0XYZ) and the ATLAS T1 (acas0XYZ) shut themselves down automatically.

Maintenance Type:
Unplanned/Outage

Description:
The CDCE room in our datacenter suffered another cooling failure around 10:30 AM this morning (6/29). Approximately 250 nodes in the shared pool (spool0XYZ) and the ATLAS T1 (acas0XYZ) shut themselves down automatically to avoid equipment damage at 10:45 AM. F&O is working on the problem, and we are continuing to monitor if additional compute node shutdowns are needed to shed heat load.