By Ofer Rind | Mon, 04/18/2022 - 18:36

Sites periodically need to declare a downtime, either one scheduled in advance (maintenance, etc.), or in some cases in response to a critical outage (power, cooling, etc.). In the latter case the ability to declare the outage quickly is useful in terms of informing ATLAS about the problem. This can avoid a situation where the site is down, but if the outage has not yet been reported then the site may begin receiving problem tickets like GGUS.

US sites have two options for declaring downtimes:

  1. Use the OSG downtime declaration system, or
  2. Use the ATLAS CRIC downtimes interface

For the OSG downtime declaration, the steps are:

  1. Select a 'Facility'
  2. Under the facility, select one or more 'Resources'
  3. Each resource should have one or more 'Services' associated with it - select these as appropriate
  4. Indicate whether the downtime is a 'Scheduled' one
  5. Add a description of the event
  6. Select the 'Severity' level - NOTE: It is recommended that 'Outage' be used in this field. Any of the other options will be treated by ATLAS / ADC as only an "AT RISK" status.
  7. Specify the starting and ending times and dates. Note that the times are in UTC.
  8. Hit 'Generate' to create a YAML file
  9. Follow the instructions for adding the new downtime to an existing .yaml file (via GitHub)

More information about declaring downtimes via OSG can be found here.

If using the CRIC downtimes interface:

  1. Select the 'Site'
  2. Indicate the Start & End times (again UTC)
  3. Select the 'Severity' (Outage is recommended)
  4. Select the 'Classification' (scheduled or unscheduled)
  5. Add a description of the event
  6. Select the affected 'Services' (similar to OSG 'Resources')
  7. Select the affected 'Protocols' (similar to OSG 'Services')
  8. The 'Affected Services' field can be left blank - the system will generate the necessary value(s)
  9. Click 'Check input data' and follow the instructions

NOTE: The CRIC downtime interface has a "single button" feature to collectively declare all services at a site down. To do this, stop after step 5. above, and let the system auto-generate the remaining fields.