By William Streck… |

Quick Start Changes

What do I need to do to run jobs as before? In general not much, however the following will be helpful:

  1. Do not Require (CPU_Experiment == "star / phenix / whatever...") in your jobs
  2. Ask for the resources you need (request_memory, etc... on jobs over 1.5Gb)
  3. You no longer need +Experiment=***\
  4. No concept of a "general queue", no more special lower policy limits

Policy

RAM must be requested if your job needs more than 1.5Gb. If a job goes more than 30% over the RAM it requests it will be evicted

Jobs are allowed to run up to 3 days (72 hours) before they will be kicked out. The ideal length for a condor job is from one to several hours at most. Try to split your work up into jobs that run for that length for best throughput.

 

Change Overview

The HTCondor systems are being integrated into a unified infrastructure, where more resources mean quicker access to a greater number of resources. In the new architecture analysis jobs are submitted first into a shared pool that all experiments contribute to and will have the most resources. Only if a job fails to match there will it attempt to flock back to the experiment's native pool

Parameter Changes

The following specific parameters / flags are changing value / meaning in the new pool

  • CPU_Experiment becomes "sdcc" but since jobs are submitted into the shared pool by default requiring this to be anything in particular will limit the resources available to your job for no reason
  • CPU_Speed is now a small integer based off of benchmarks
  • +Job_Type / +Experiment may continue to be meaningful to the individual experiment but not on the main shared pool

 

Accounting Groups

Each experiment that uses the shared pool will have their jobs assigned to an accounting group in HTCondor. These groups have a quota, and in a steady-state scenario where all groups have jobs in them the pool will be partitioned according to the quotas set on these groups (proportional to the equipment each group has contributed to the pool). When one group is not using the pool their slots will be allocated to those groups who have jobs to run.

Your average user does not need to concern themselves with these groups since analysis jobs are auto-assigned into the appropriate group without any user actions being needed.