By William Streck… |

The shared Tier-3 facility uses the HTCondor batch system, and this document will explain how to use HTCondor in the Tier-3 environment.

Please see this documentation for more general information regarding HTCondor at the RACF.

Running Jobs

Our environment uses a feature of HTCondor called Accounting Groups to share resources among the various institutions using this pool.

Below is a template you can use to get started running jobs in our pool.

Universe        = vanilla

Notification    = Never
Executable      = /path/to/code/executable
Arguments       = arg1 20 $(cluster)

# Get the environment from the submission host?
GetEnv          = True

# An expression where your job can specify its needs (not usually needed)
# Requirements   = (Machine == "some-machine.bnl.gov")

# Only set these if you want to stream files to/from the submit machine
# should_transfer_files = YES
# when_to_transfer_output = ON_EXIT_OR_EVICT

# The standard output and error of your job's executable
Output          = job.$(cluster).$(process)
Error           = job.$(cluster).$(process)

# Should not need to transfer input files 

# Don't put the user-log file on NFS disk as it can be slow,
# you can create a temporary directory on the submit node
# !! NOTE: your home directory is likely on NFS
Log             = /tmp/<username>/log.$(cluster).$(process)

request_memory = 2.5G

# Queue forms a job, if <N> is given then $(process) expands to <N>
Queue 10

 

Our Configuration and Manual Pages

Some info using terminology that may make sense after you've familiarized yourself with HTCondor:

  1. We run only Vanilla universe jobs
  2. We use Accounting Groups to control resource allocation across institutions
    • The accept_surplus flag is on for all groups, so if not all groups are using their resources, the leftovers get fairly allocated to the groups with users
    • Users must be in the config-file as belonging to a institution to run jobs in that group. If you are not, the group group_atlas.general serves as a catchall until you get registered
    • Your jobs are automatically classified into your assigned accounting group
  3. There are limits on job-length and memory usage described in the Policy Document

See here for the manual page of condor_submit

And see here for the entire user manual to HTCondor

Group Membership

As mentioned above, you need to be known to belong to an institute to be able to submit jobs to their group. The accounting_group field gets set automatically. If you see a warning like

*** You are not listed in an instituiton, defaulting to the general group. Please request membership in an institution

you need to ask to join your institution via RT

The currently defined groups and users are listed here