The SDCC has installed HTCondor (version 6.8) on all Linux compute nodes in SDCC for use in controlling analysis and reconstruction jobs. This distributed batch queuing system implements sophisticated job scheduling and policy-driven resource allocation control, to ensure optimal performance. It can tolerate the failure of any host or group of hosts, it provides job resubmission, checkpointing, job migration, and many other features. It can also make use of specific hosts based on a time schedule or on the current load level. HTCondor has built-in fair share scheduling, guaranteeing access to resources for users or groups of users. The system supports pre and post-execution commands. For more information, please check the HTCondor web site.

Setup

HTCondor binaries are installed in /usr/bin and /usr/sbin. Other than making sure that the two directories above are in one's default $PATH, users normally do not have to make any other adjustments to their working environment.

Documentation

HTCondor manuals are available online from the HTCondor manual page:

Manpages are available for almost all of the HTCondor commands. From any of the farm nodes, you can access these manpages just as you would any other manpage (i.e. man condor_submit). 

HTCondor Links:

HTCondor - Multiple Job Submission

It is rarely the case that a user submits just one job to the batch queue. Often a user will submit hundreds of jobs using custom scripts and tools to automate the process. HTCondor provides tools to make multiple job submission easier. There are two advantages to using these builtin features: first, job submission will be several times faster; second, HTCondor can provide more robust scheduling if sets of jobs are submitted using these tools.

To demonstrate how to submit multiple jobs using HTCondor we start with a vanilla job description file: