Duration:
5/15/2025 8:30 am — 5/15/2025 9:30 am
Group Responsible:
SDCC Operations
Affected Area:
Git repositories hosted on git.racf.bnl.gov
Expected Impact:
Git instance unavailable
Maintenance Type:
Planned Maintenance/Downtime
Description:
On Thursday, May 15, the RACF Gitea instance (git.racf.bnl.gov/gitea) will be undergoing maintenance from 8:30AM-9:30AM. Repositories hosted on this instance may be unavailable for all or part of this time.
Duration:
5/2/2025 2:45 am — 5/2/2025 5:25 am
Group Responsible:
IT Fabric
Affected Area:
Linux Farm
Expected Impact:
Temporary Loss of cpu resources
Maintenance Type:
Unplanned/Outage
Duration:
4/21/2025 8:00 am — 4/21/2025 8:30 am
Group Responsible:
SDCC Operations
Affected Area:
Git repositories hosted on git.racf.bnl.gov
Expected Impact:
Some personal access tokens may need to be regenerated
Maintenance Type:
Transparent Upgrade/Maintenance
Duration:
3/24/2025 11:57 am — 12/31/1969 7:00 pm
Group Responsible:
IT Fabric
Affected Area:
sl7 portion of the condor shared pool
Expected Impact:
jobs should run within normal time, some may need restart
Maintenance Type:
Information
Description:
sl7 shared pool resource levels have been restored.\n\nInvestigation into the root cause continues, could recur until identified and resolved.
Duration:
3/24/2025 3:25 am — 3/24/2025 12:00 pm
Group Responsible:
IT Fabric
Affected Area:
sl7 condor shared pool
Expected Impact:
delayed jobs, after recovery some will likely need restart
Maintenance Type:
Unplanned/Outage
Description:
The sl7 portion of the shared pool is suffering from reduced resources.\n\nInvestigation under way ...
Duration:
3/22/2025 11:25 am — 3/22/2025 10:00 pm
Group Responsible:
IT Fabric
Affected Area:
HTCondor Shared Pool
Expected Impact:
A portion of the compute farm is unavailable
Maintenance Type:
Unplanned/Outage
Duration:
3/22/2025 11:25 am — 3/22/2025 6:36 pm
Group Responsible:
IT Fabric
Affected Area:
HTCondor Shared Pool
Expected Impact:
sl7 jobs should complete as before the outage
Maintenance Type:
Information
Description:
sl7 shared pool resource levels have been restored.
Duration:
3/7/2025 9:52 am — 3/7/2025 6:30 pm
Group Responsible:
IT Fabric
Affected Area:
many sdcc storage, compute, and services systems
Expected Impact:
impact to multiple services and experiments
Maintenance Type:
Unplanned/Outage
Description:
Bulk of services were restored as of ~18:30 EST 3/7.
Duration:
3/7/2025 9:42 am — 3/7/2025 12:00 pm
Group Responsible:
IT Fabric
Affected Area:
many sdcc storage, compute, and services systems
Expected Impact:
impact to multiple services and experiments
Maintenance Type:
Unplanned/Outage
Description:
BNL SCDF(SDCC) b725 datacenter experienced power loss on at least one of it's power systems.\n\nRecovery is underway.\n\nPostmortem will be done.
Duration:
3/3/2025 10:25 pm — 3/4/2025 3:00 pm
Group Responsible:
IT Services
Affected Area:
BNLBox
Expected Impact:
Service recovered
Maintenance Type:
Information
Description:
The backend Lustre storage server has been fixed and BNLBox service has been restored. Please report any residual issues via email to RT-RACF-StorageManagement@bnl.gov.