Job Statuses
- CREATED
- In the CRS system but not a candidate to run, not seen by submit-daemon yet
- QUEUED
- Submitted to the negotiator, will eventually run
- STAGING
- If inputs require staging, job will enter this stage
- SUBMITTED
- All stage-files (if any) are ready, submitted to condor
- IMPORTING
- Condor job is running, processing the input files
- RUNNING
- All input files staged to node, job executing now
- EXPORTING
- Job exes ended, files are being exported
- KILLED
- Jobs that were staging, submitted or running can be killed. Can be reset from here.
- HELD
- Jobs that get Held by Condor enter this state. Can be reset from here
- RETRY
- Jobs that had their files staged but by the time they ran their files were back on tape, will be automatically resubmitted.
- ERROR
- Jobs where something went wrong, see error message for details
- DONE
- Jobs that have finished without error, will be cleaned up if auto_remove is set
Normal Job Progression
CREATED->QUEUED->[STAGING]->SUBMITTED->IMPORTING->RUNNING->EXPORTING->DONE
With QUEUED and STAGING being handled by the submitd, and IMPORTING through EXPORTING being in Condor.
File Status Sequences
Each file goes through a series of status sequences just like a job. Here are the possible statuses and their transitions.
Input Files
- NULL
- Files created in this state by default
- REQUESTED
- Files that have stage requests in the HPSS Batch system
- STAGING
- Files currently being staged by HPSS, not yet ready
- READY
- Files whose stage requests have come back successfully
- MISS
- Files that were staged, but the job ran later and found them wiped from the cache
- IMPORTING
- Files that are currently being PFTP'd to the execute node
- DONE
- Files that have import successfully
- ERROR
- Something went wrong staging/importing the file, see message for details
- UNKNOWN
- Files lost/dropped by HPSS Batch system
Normal Progression
NULL->[REQUESTED->STAGING->READY]->IMPORTING->DONE
Where REQUESTED--READY are handled from within the submit-daemon only for file types that require pre-staging (HPSS)
- NULL
- Default state, files created here
- FOUND
- Job finished executing and the output file is found in the working-directory
- EXPORTING
- Job is actively staging the file out
- DONE
- File is in its final destination
- NOT_FOUND
- Job finished but file wasn't created by the job
- ERROR
- Something went wrong, see message for details
Normal Progression
NULL->FOUND->EXPORTING->DONE
Queue Priority
The following formula is how shares of the farm are calculated by queue priorities, where the factors for each queue q based on their priorities p_q are later normalized to 1.0
Queued Job Scaling
The target size of the queue of idle and staging jobs shrinks as the farm fills up. You provide two parameters in the config file, q_empty and q_full, which are percentages. These represent the total number of idle+staging jobs as a percentage of the slots available to run CRS jobs. For example, q_empty = 80, q_full = 4 would mean that with no jobs running, the number of jobs staging/submitted would be 0.8 * farm_size, and when the farm is full, 0.04 * farm_size. There is an exponential decay between these two points given by the following formula.
Where 0 < q_n < 1 (in the configs these are precentages so enter 0-100)