Duration:
12/30/2024 9:00 am — 12/30/2024 6:00 pm
Group Responsible:
IT Fabric
Affected Area:
All SDCC services
Expected Impact:
No access to SDCC resources (computing, storage and services)
Maintenance Type:
Planned Maintenance/Downtime
Description:
A critical maintenance/replacement procedure on the BNL main electrical grid scheduled for Monday, Dec. 30th was announced
to the SDCC on very short-notice last week. This procedure is planned to start around 12 noon and last approximately 4 hours.
We recognize this procedure is happening during the BNL-declared "quiet period", but a postponement
would incur increased costs to the Lab and potentially place this must-do procedure during the start-up period for RHIC run 25,
which is deemed even less desirable than the current plan. BNL management has decided to go ahead with the Dec. 30th
procedure, as planned.
This procedure requires transferring the power source from the electrical utility to the back-up generator, with an UPS to
bridge the time gap (a few seconds) between utility and generator power, and then remain on generator power for the duration
of this procedure. Because there is a small risk of failure during the transfer process and in generator operations and because of
reduced staff availability during the BNL quiet period, the SDCC management has decided to quiet down the facility resources
to minimize the chances of data corruption, service disruptions and hardware failures, in the unlikely event that an unplanned
power outage occurs.
Quieting down means: 1) draining batch jobs (HTCondor and Slurm), holding new ones from starting and stopping interactive
access to SDCC cpu resources on SUNDAY (DEC., 29TH) AT 3 PM ET and 2) stopping all data read/write and movement activities
(disk and tape) on MONDAY (DEC. 30TH) AT 9AM ET.
Announcements to SDCC Liaisons and program/experimental PoCs will be made when SDCC resources are fully available again.