By Anonymous |

Duration:
3/22/2025 11:25 am — 3/22/2025 10:00 pm

Group Responsible:
IT Fabric

Affected Area:
HTCondor Shared Pool

Expected Impact:
A portion of the compute farm is unavailable

Maintenance Type:
Unplanned/Outage

Description:
About half of the SL7 hosts on the shared HTCondor pool (~7K job slots) experienced an outage at approximately 11:25am today. The Alma 9 hosts were apparently unaffected.\n\nExperts are on site investigating and will update further as the situation evolves. Jobs submitted to SL7 hosts may be delayed due to limited resources until service is fully restored, and some may need to be restarted.