By William Streck… |

HTCondor jobs require that the job description files (JDF) obey a certain syntax. Some examples of JDF's can be found here:

An example analysis job file would look like the following:

# All local jobs are part of the vanilla universe.
Universe        = vanilla

# The executable we want to run.
Executable      = /bin/echo

# The argument to pass to the executable.
Arguments       = "test job"

# The requirement line specifies which machines we want to
# run this job on.  Any arbitrary classad expression can
# be used.
Requirements    = (CPU_Speed >= 1)

# Rank is an expression that states how to rank machines which 
# have already met the requirements expression.  Essentially, 
# rank expresses preference.  A higher numeric value equals better 
# rank.  Condor will give the job the machine with the highest rank.
Rank		= CPU_Speed

# Jobs by default get 1.4Gb of RAM allocated, ask for more if needed
# but if a job needs more than 2Gb it will not be able to run on the
# older nodes
request_memory = 1800M

# If you need multiple cores you can ask for them, but the scheduling
# may take longer the "larger" a job you ask for
request_cpus = 1

# This flag is used to order only one's own submitted jobs 
# The jobs with the highest numbers get considered for 
# scheduling first.
Priority        = 4

# Copy all of the user's current shell environment variables 
# at the time of job submission.
GetEnv          = True

# Used to give jobs a directory with respect to file input 
# and output.
Initialdir      = /experiment/u/user/jobdir/

# Input file given to the job.
Input           = /dev/null

# The job's stdout is sent to this file.
Output          = /experiment/u/user/myjob.out

# The job's stderr is sent to this file.
Error           = /experiment/u/user/myjob.err

# The htcondor log file for this job, useful when debugging.
Log             = /experiment/u/user/myjob.log.$(Cluster)


# This should be the last command and tells condor to queue the
# job.  If a number is placed after the command (i.e. Queue 15)
# then the job will be submitted N times.  Use the $(Process)
# macro to make your input/output and log files unique.
Queue

There are a few more commands to make HTCondor more useful. It is recommended that you place all your input, output, and executable files on an NFS filesystem. The example job file above and those mentioned at the top of this page assume this. When you specify an output file on an NFS filesystem you can view your job's output in realtime (provided your job doesn't buffer output). In some circumstances, it would be useful to buffer the output on the execute node:

should_transfer_files = YES
when_to_transfer_output = ON_EXIT_OR_EVICT

The two flags above will transfer all the necessary input files and executable to the execute node. When the job completes it will transfer the output files back to the location specified with the Iwd flag above. The transfer_output_files flag should NOT be used unless you really know what you are doing. From the HTCondor manual: other than for globus universe jobs, if transfer_output_files is not specified, HTCondor will automatically transfer back all files in the job's temporary working directory which have been modified or created by the job.

 

Advanced Submission

See this section of the HTCondor manual for an example on using arguments and filename globs in the queue macro

 

Once you've created your job file you can submit it with the condor_submit command:

$ condor_submit example2.job
Submitting job(s).
Logging submit event(s).
<# of jobs> job(s) submitted to cluster <JOB_ID>.

You can check the progress of your job with:

$ condor_q -submitter <username>

or with:

$ condor_q -analyze <JOB_ID>

or for more information with:

$ condor_q -better-analyze <JOB_ID>

You can also kill your condor job with:

$ condor_rm <JOB_ID>

More complex batch jobs can be submitted via shell scripts.

There are a variety of condor commands to monitor activity and status of condor jobs. To summarize jobs by owner:

$ condor_status -submitters

To summarize jobs by server:

$ condor_status -claimed

As always, read the man pages for more detailed information, examples and descriptions of options. The most commonly used commands are:

  • condor_submit
  • condor_rm
  • condor_q
  • condor_status