By William Streck… |

Job Requirements

Jobs must belong to an Accounting Group, as documented in the Usage document.

The following are limitations that the jobs must meet or face eviction or non-starting:

  • Must ask for how much memory you need via request_memory (default is 1.5G) (with a +20% grace window)
  • Must run for fewer than 3 days

Features:

  • We support multicore: you can ask for however many CPUs you require with request_cpus

Job Eviction Policy

Regular jobs are guaranteed 3 days of runtime before being preempted

Jobs are evicted unconditionally if they exceed 30% over what RAM they ask for or run over 3 days

Jobs that have been evicted for memory-usage will be eligable to run again unless a periodic_hold or periodic_remove statement is added -- we suggest the following:

periodic_hold = (NumJobStarts >= 1 && JobStatus == 1)

Which holds jobs that have been put back into the Idle state (1) after starting at least once.

Queue Cleanup Policy

There are some system-wide expressions that keep the condor-queues clean from old jobs.

  1. Jobs that have ran at least once, been evicted for whatever reason, and are Idle in the queue for over 7 days will be placed on Hold
    • Frequently jobs that use too much memory or run too long won't be able to start again and this keeps the old jobs from polluting the queue
  2. Jobs that are on Hold for over 3 weeks get removed
    • Jobs that use periodic_hold, or are held due to policy (1) above will be cleaned up automatically.

For example, if a job uses too much RAM and gets evicted so it can't run again, it will be put on hold after 1 week and removed after 3 additional weeks -- giving the user about 1 month of opportunity to clean up their jobs.