By SDCC News |

Duration:
6/16/2021 12:30 pm — 6/16/2021 3:00 pm

Group Responsible:
IT Fabric

Affected Area:
ATLAS T1 Compute Nodes, spool0XYZ Systems

Expected Impact:

Maintenance Type:
Unplanned/Outage

Description:
There was a major cooling failure in the CDCE room in SDCC's datacenter earlier today (6/15), starting around 12:30 PM EST, due to an issue with the chilled water system in the building. Temperatures rose quickly, triggering automated monitoring software shutdowns of compute nodes in that room around 1:00 PM in order to avoid equipment damage. This affected all ATLAS T1 compute nodes, and a large portion of the shared pool (all spool0XYZ systems). Parts of our RHEV system were also affected. The issue with the building chilled water circulation was repaired by approximately 3:00 PM, and the farm equipment was powered back online, and opened to jobs after the room room temperature stabilized at 3:30 PM.\n\nAt this time we believe all affected services have been restored. If you continue to experience issues, please submit a ticket to RT.