CRS Technical Information

Job Statuses

CREATED: In the CRS system but not a candidate to run, not seen by submit-daemon yet
QUEUED: Submitted to the negotiator, will eventually run
STAGING: If inputs require staging, job will enter this stage
SUBMITTED: All stage-files (if any) are ready, submitted to condor
IMPORTING: Condor job is running, processing the input files
RUNNING: All input files staged to node, job executing now
EXPORTING: Job exes ended, files are being exported
KILLED: Jobs that were staging, submitted or running can be killed. Can be reset from here.
HELD: Jobs that get Held by Condor enter this state. Can be reset from here
RETRY: Jobs that had their files staged but by the time they ran their files were back on tape, will be automatically resubmitted.
ERROR: Jobs where something went wrong, see error message for details

DONE: Jobs that have finished without error, will be cleaned up if auto_remove is set

Normal Job Progression

CREATED->QUEUED->[STAGING]->SUBMITTED->IMPORTING->RUNNING->EXPORTING->DONE

With QUEUED and STAGING being handled by the submitd, and IMPORTING through EXPORTING being in Condor.

File Status Sequences

Each file goes through a series of status sequences just like a job. Here are the possible statuses and their transitions.

Input Files

NULL: Files created in this state by default
REQUESTED: Files that have stage requests in the HPSS Batch system
STAGING: Files currently being staged by HPSS, not yet ready
READY: Files whose stage requests have come back successfully
MISS: Files that were staged, but the job ran later and found them wiped from the cache
IMPORTING: Files that are currently being PFTP'd to the execute node
DONE: Files that have import successfully
ERROR: Something went wrong staging/importing the file, see message for details
UNKNOWN: Files lost/dropped by HPSS Batch system

Normal Progression

NULL->[REQUESTED->STAGING->READY]->IMPORTING->DONE

Where REQUESTED--READY are handled from within the submit-daemon only for file types that require pre-staging (HPSS)

NULL: Default state, files created here
FOUND: Job finished executing and the output file is found in the working-directory
EXPORTING: Job is actively staging the file out
DONE: File is in its final destination

NOT_FOUND: Job finished but file wasn't created by the job
ERROR: Something went wrong, see message for details

Normal Progression

NULL->FOUND->EXPORTING->DONE

Queue Priority

The following formula is how shares of the farm are calculated by queue priorities, where the factors for each queue q based on their priorities p_q are later normalized to 1.0

Queued Job Scaling

The target size of the queue of idle and staging jobs shrinks as the farm fills up. You provide two parameters in the config file, q_empty and q_full, which are percentages. These represent the total number of idle+staging jobs as a percentage of the slots available to run CRS jobs. For example, q_empty = 80, q_full = 4 would mean that with no jobs running, the number of jobs staging/submitted would be 0.8 * farm_size, and when the farm is full, 0.04 * farm_size. There is an exponential decay between these two points given by the following formula.

Where 0 < q_n < 1 (in the configs these are precentages so enter 0-100)

Job Statuses

Normal Job Progression

File Status Sequences

Input Files

Normal Progression

Normal Progression

Queue Priority

Queued Job Scaling

Information

News & Announcements