The shared Tier-3 facility uses the HTCondor batch system, and this document will explain how to use HTCondor in the Tier-3 environment.
Please see this documentation for more general information regarding HTCondor at the RACF.
Running Jobs
Our environment uses a feature of HTCondor called Accounting Groups to share resources among the various institutions using this pool.
Below is a template you can use to get started running jobs in our pool.
Universe = vanilla
Notification = Never
Executable = /path/to/code/executable
Arguments = arg1 20 $(cluster)
# Get the environment from the submission host?
GetEnv = True
# An expression where your job can specify its needs (not usually needed)
# Requirements = (Machine == "some-machine.bnl.gov")
# Only set these if you want to stream files to/from the submit machine
# should_transfer_files = YES
# when_to_transfer_output = ON_EXIT_OR_EVICT
# The standard output and error of your job's executable
Output = job.$(cluster).$(process)
Error = job.$(cluster).$(process)
# Should not need to transfer input files
# Don't put the user-log file on NFS disk as it can be slow,
# you can create a temporary directory on the submit node
# !! NOTE: your home directory is likely on NFS
Log = /tmp/<username>/log.$(cluster).$(process)
request_memory = 2.5G
# Queue forms a job, if <N> is given then $(process) expands to <N>
Queue 10
Our Configuration and Manual Pages
Some info using terminology that may make sense after you've familiarized yourself with HTCondor:
- We run only Vanilla universe jobs
- We use Accounting Groups to control resource allocation across institutions
- The accept_surplus flag is on for all groups, so if not all groups are using their resources, the leftovers get fairly allocated to the groups with users
- Users must be in the config-file as belonging to a institution to run jobs in that group. If you are not, the group group_atlas.general serves as a catchall until you get registered
- Your jobs are automatically classified into your assigned accounting group
- There are limits on job-length and memory usage described in the Policy Document
See here for the manual page of condor_submit
And see here for the entire user manual to HTCondor
Group Membership
As mentioned above, you need to be known to belong to an institute to be able to submit jobs to their group. The accounting_group field gets set automatically. If you see a warning like
*** You are not listed in an instituiton, defaulting to the general group. Please request membership in an institution
you need to ask to join your institution via RT
The currently defined groups and users are listed here