By Louis Pelosi |

Job Statuses

CREATED
In the CRS system but not a candidate to run, not seen by submit-daemon yet
QUEUED
Submitted to the negotiator, will eventually run
STAGING
If inputs require staging, job will enter this stage
SUBMITTED
All stage-files (if any) are ready, submitted to condor
IMPORTING
Condor job is running, processing the input files
RUNNING
All input files staged to node, job executing now
EXPORTING
Job exes ended, files are being exported
KILLED
Jobs that were staging, submitted or running can be killed. Can be reset from here.
HELD
Jobs that get Held by Condor enter this state.  Can be reset from here
RETRY
Jobs that had their files staged but by the time they ran their files were back on tape, will be automatically resubmitted.
ERROR
Jobs where something went wrong, see error message for details
DONE
Jobs that have finished without error, will be cleaned up if auto_remove is set
Normal Job Progression

CREATED->QUEUED->[STAGING]->SUBMITTED->IMPORTING->RUNNING->EXPORTING->DONE

With QUEUED and STAGING being handled by the submitd, and IMPORTING through EXPORTING being in Condor.

 

File Status Sequences

Each file goes through a series of status sequences just like a job.  Here are the possible statuses and their transitions.

Input Files

NULL
Files created in this state by default
REQUESTED
Files that have stage requests in the HPSS Batch system
STAGING
Files currently being staged by HPSS, not yet ready
READY
Files whose stage requests have come back successfully
MISS
Files that were staged, but the job ran later and found them wiped from the cache
IMPORTING
Files that are currently being PFTP'd to the execute node
DONE
Files that have import successfully
ERROR
Something went wrong staging/importing the file, see message for details
UNKNOWN
Files lost/dropped by HPSS Batch system
Normal Progression

NULL->[REQUESTED->STAGING->READY]->IMPORTING->DONE

Where REQUESTED--READY are handled from within the submit-daemon only for file types that require pre-staging (HPSS)

NULL
Default state, files created here
FOUND
Job finished executing and the output file is found in the working-directory
EXPORTING
Job is actively staging the file out
DONE
File is in its final destination
 
NOT_FOUND
Job finished but file wasn't created by the job
ERROR
Something went wrong, see message for details
Normal Progression

NULL->FOUND->EXPORTING->DONE

Queue Priority

The following formula is how shares of the farm are calculated by queue priorities, where the factors for each queue q based on their priorities p_q are later normalized to 1.0

formula 1

formula 2

Queued Job Scaling

The target size of the queue of idle and staging jobs shrinks as the farm fills up.  You provide two parameters in the config file, q_empty and q_full, which are percentages.  These represent the total number of idle+staging jobs as a percentage of the slots available to run CRS jobs.  For example, q_empty = 80, q_full = 4 would mean that with no jobs running, the number of jobs staging/submitted would be 0.8 * farm_size, and when the farm is full, 0.04 * farm_size.  There is an exponential decay between these two points given by the following formula.

formula 3

 

Where 0 < q_n < 1 (in the configs these are precentages so enter 0-100)