By Kevin Casella |

A CRS job is defined using a job description file (henceforth a jdf) that is formatted as a pretty standard config file:

# A comment starts with a pound-sign

[section-name]
variable = value
something = else

where keys and values can be separated by an equal sign or a colon. There are four mandatory sections: . [main] . [exec-0]. . [input-0] . [output-0] There can be more inputs/outputs, but one each is required (else what’s the point?). The input and output sections are formatted like [input-n] where n E (0, num_inputs-1). What follows is a basic example of a jdf with all the required sections. For more detail of the types of files and the exact arguments, and some optional variables, see the example job files.

# (for a job with 2 inputs and 1 output)

[main]
num_inputs = 2
num_outputs = 1

name = sample_job

[exec-0]
exec = <path of executable>
args = <arguments to exec>
env = <additional enviroment variables>

[input-0]
type = <HPSS/LOCAL/etc...>
path = <what would be returned by `basename`>
name = <just the file name>

[input-1]
type = < etc...
...

[output-0]
type = <HPSS/DCACHE/etc...>
path
name

[...]

[output-n]
...

This file’s name is passed as an argument to "crs_job -create/-c" which reads it, validates it, and inserts it into the database (see database-schema docs), giving it a unique job-id. The job then needs to be submitted to the CRS submit-daemon ("crs_job -submit <jobid>"), or "crs_job -insert <file>" which does these two steps in one.

When a job is submitted it progresses through various phases (see whitepapers/status-sequences). The job’s state is tracked with the database as it progresses through various phases of the job, and this database keeps all the information about the job once it is in the CRS system.

Job Submission

Jobs can be submitted to different queues by adding a queue=<name> variable to the [main] section of the jdf. Queues are defined in the database (and can be edited with a cmdline tool crs_qedit) by a name and a priority. Priorities will work similar to inverted nice values, with 0 being the default and negative numbers being lower priority. Each negotiation cycle CRS will run a number of jobs from each queue, targeting the steady state to be each queue having a share determined by its priority value. See job-queues for more info.

Environment Variables

Each executable will run in an enviroment with the following variables set (to keep compatibility with the old system)

  • CRS_[STDOUT]|_DIR = the executable’s stdandard output/error directory (defaults to <exec-dir>)

  • STD_[OUT]|[ERR] = the executable’s standard output/error filename (defaults to <executable-name><exec-number>.std(out)|(err))

  • CRS_JOB_NAME = only defined if name attribute is defined in the job

  • ACTUAL_INPUT|OUTPUT<n> = the local file’s path (after transfer from remote if applicable)

  • INPUT|OUTPUT<n> = the full path of the input file (incl e.g. HPSS/DCACHE part)

In addition to the previous variables, you can specify other enviorment variables with a semi-colon deliniated list in [exec-n]'s env= variable e.g.

env = VAR_1=val1;VAR_2=val2;SOMETHING=nothing

Environment variables set in this manner will overwrite those set automatically (like INPUT0 from above). Likewise, environment variables set both here and automatically will overwrite environment variables provided to the program, either by condor or some other method. These environments are provided to the called programs only, not the running job.

Error Handling

Jobs can specify optional callbacks to be executed when they enter an error state or the finish, these callbacks are executed wherever the state transition occurs. They could take place on either the submit or execute node, and are run without a timeout or error-checking, so please keep them short, error-free (if possible :)!) and able to run anywhere (best to be in NFS or AFS).

When a job enters the error-state it will have an error-message associated with it, which will be readable doing a "crs_job -print/-long <jobid>".

If there is an error with a stage request, it will be visible in the output from the -print-long command

Misc

Jobs that land on the node and find their files no longer in the HPSS disk cache exit and their status is set to RETRY. The submit daemon will pick up any RETRY jobs and resubmit unless they have already gone through max_tries attempts at running (max_tries is defined in [main] and defaults to 2).

auto_remove is a boolean flag that determines if jobs that finish properly stay in the queue. If it is set, the jobs will remain in the DONE state for a few hours before being cleaned up automatically.

Running Jobs

Setting Up

The software is in three packages, crs-common, -run, -submitd. The last two depend on the first. crs-run contains the executable itself and must be installed on farm nodes where you want to run CRS jobs (jobs don’t transfer executables like in old system). crs-submitd contains two daemons, crs_submitd and crs_logserver, that run on your submit host and schedule jobs and collect logs from them as well. It also contains the command line interface for modifying jobs and queues.

On the submit host, you need the crs_submitd and crs_logserver daemons running, these can be started with the commands "crs_submitd start" and the same for the log-server. These daemons run as your user and read their configuration file from the "CRS_CONF" environment variable (set in your .cshrc or similar). The default config files are in /etc/crs/. If you wish to tune some parameters, you can put a local version under /home and point the environment there. By default, the system expects condor job-templates to be under ~/crs/templates, and you can modify those there.

Guide to running jobs

*See the output of running "crs_job --help" for detailed help, or run "crs_job --help-cmd XXX" where XXX is a command name to get help on the specific syntax of that command. Shorthand for this is crs_<command> where <command> is what you would have put after the "--".

  1. Write a job description file (JDF) according to the syntax guides provided, or convert them from old CRS files using the program jdfconvert provided.

  2. When ready, submit this job, run:

    crs_job -create <file>

    where <file> can be a blob or list of files. This will insert a job into the database, assigning it a job-id, and leaving it in the CREATED state.

  3. In order to make the job a candidate to run, you must submit it to the negotiator, using:

    crs_job -submit <jobid>

    This allows the CRS negotiator to consider the job to run next cycle and puts it into the QUEUED state. Eventually, it will submit the job to condor and the job will progress to the SUBMITTED and RUNNING states as soon as it lands on the farm. This job should now run and will enter DONE when finished or ERROR if something goes wrong.

    • Note: the command crs_job -insert <file> will create a job and submit it to the negotiator at once.

  4. Monitor the progress of jobs using the -stat / -summary / -machines commands. Also see the log messages that the running jobs emit to the logserver and are put (by default) into ~/job-logs

Crs Queues

Queues are implemented in CRS to allow jobs to be classified and to let this classification determine how many resources each job has access to. Each queue is assigned a priority number that determines it’s share of the available farm resources. Valid priorities are between -10.0 and +10.0, and can be a decimal. A higher priority value means that the queue has more of the farm allocated to it. Assuming there are jobs waiting in every queue, CRS will submit jobs in an effort to have the farm steady-state reflect the size of each queue (as determined by its priority-value). If there are no jobs in one of the queues, CRS will allocate the extra slots to each queue where there are jobs in queue-priority order.

The tool "crs_qedit" (run with --help to see usage) is helpful here as it allows you to create, modify, and delete queues. You can see the hypothetical share of the farm where every queue has jobs waiting in it via the -show flag. As mentioned above, a queue that is empty or has fewer jobs waiting than its share of the farm has its extra slots made available to each other queue in priority order.

Job Example

Main Section

The main section contains declarations of the number of inputs and outputs the job expects to make, along with an optional queue selection (which defaults to a queue named default that is always in the database).

[main]
queue = low

num_inputs = 3
num_outputs = 3

This is how many HPSS cache-misses are allowable (cycles that the program would go through STAGE→RUN→RETRY→STAGE again) before it fails altogether

max_tries = 2

There are also two optional callbacks to be executed when the job either completes or fails with an error. They will be interperted by a shell (pipes, etc are OK). These callbacks can be executed anywhere the job enters an error condition, either during staging (on the submit node) or running (on the farm node), so please specify an executable that is accessible anywhere (network filesystem or in the image)

on_error = /path/to/executable -arg1 arg2
on_complete = echo "Done" | nc somehost 2222

Jobs can be given a name, must be alphanumeric plus underscore or dash only, and must be unique — jobs cannot have conflicting names in the database. You will be able to refer to these jobs via the name provided here in the command line wherever a <jobspec> field is specified in the documentation.

name = something_alphanumeric_here

Jobs with thie set to "True"/"Yes"/"1"/"on" will be removed from the database automatically within 2 hours of them entering the DONE state

auto_remove = true

Executable and Files Sections

Notice how with more than one [exec] section you need to specify the pertinent inputs and outputs for each executable. Notice also that you can repeat inputs for multiple executables, and that you must cover all files defined below with the set of input/output files declared in all executables or you will get an error. If gzip_output flag is present and set to True the standard output and error streams will be compressed before writing them to their final locations. If the flag is set to "stderr" or "stdout" then only the named stream will be compressed. "Both" is synonymous with "True" or "yes" and will result in both being compressed.

[exec-0]
exec = /afs/rhic.bnl.gov/star/something-here/root_caller.csh
args = -t 32 -D -s '/star/u/' %(inputs)s
inputs = 0,1
gzip_output = True
stdout = /phenix/zdata99/whatever/this.txt
stderr = /tmp/joblog/that.txt
outputs = 1

[exec-1]
exec = /afs/rhic.bnl.gov/star/something-here/root_caller_again.csh
args = -t 32 -D -s '/star/u/'
inputs = 0,2
outputs = 0

[exec-2]
exec = /afs/rhic.bnl.gov/star/something-here/wrap_up
args = -v -q 23
env = CONFIG_DIR=/home/starreco/cfg;EXAMPLE_ENV=whatever
gzip_output = stdout
stdout = /phenix/zdata99/whatever/this2.txt
stderr = /tmp/joblog/that2.txt
outputs = 2


# ************ Input files ************

[input-0]
type = HPSS
path = /home/starsink/raw/daq/2010/035/11035070/
file = st_upc_11035070_raw_3280001.daq

[input-2]
type = HPSS
path = /home/starsink/raw/daq/2010/035/11035219/
file = st_upc_11411579_raw_5341001.daq

[input-1]
type = HPSS
path = /home/starsink/raw/daq/2010/033/11033073/
file = st_physics_11033073_raw_3030002.daq

# ************** Outputs *******************
[output-0]
type = HPSS
path = /home/starreco/reco/AuAu2010_production/ReversedFullField/P10ij/2010/035/
file = st_upc_1242424234

[output-1]
type = DCACHE
path = /pnfs/rcf.bnl.gov/star/starreco/scratch//run10auau_zerof_pro85/what/
file = DST_MPC_UP_run10auau_zerof-2432342341121

[output-2]
type = LOCAL
path = /phenix/zdata01/phnxreco/run10auau_zerof_pro85/2342_DST_242323/
file = testfile