From: Christian Holm Christensen (cholm@hehi03.nbi.dk)
Date: Sun Apr 06 2003 - 09:06:57 EDT
Hi Flemming et al, "Flemming Videbaek" <videbaek@sgs1.hirg.bnl.gov> wrote concerning LSF queues on RCF [Thu, 3 Apr 2003 15:46:31 -0500] ---------------------------------------------------------------------- > I requested a change to the queueing systems for our machines in > part based on the experience we had with temporary overloading of > the queue. It is certainly possible to expand on this as more > experienced is gained. > > The changes are summarized below. The main change is really that > they are fairshare while it was setup as first-come first serve. I guess that means that each of us have a certain amount of credit, and we use that credit as we submit jobs. We gain credit by a general reset each day, week, month, or so. Is that correctly understood? > The will prevent quees from being monopolized by a single user. For > the purpose of production analysis the bramreco user > will have a priority 3* that of others (this works well for > start). Perhaps we should list on some web-page just exactly who has access to that account. Here's the ones I know about: Flemming (of course) Eun-Joo Steve Hiro Ian myself The reason I think that info need to be available, is that the higher priority of that user effectively circumvents the normal accounting rules that means one can find the right person to blame with very little hassle. > The short high prio queue is intended for calibration runs, > dst generations etc while the regular cas should be used for longer > running jobs (acceptance maps, monte carlo). What's the time-limit on those queues? How long does a XXX run take? > "The Brahms LSF queues have been reconfigured as requested by > Flemming and their CRS nodes have been included in a new, separate > queue. CRS nodes in the LSF queu? Cool! Hopefully that means we'll exploit that farm a little better in the future. After all those nodes are pretty idle has it is now. Perhaps we should of making some scripts that would sink HPSS files from an LSF batch job, to test my hypothesis that the LSF batch software is better than the hand-written Perl scripts and MySQL backend that is currently used at the CRS farm. Perrhaps we should also think about running PROOF on that farm. > The queues are summarized as follows: > > brahms_cas prio=30 standard fairshare queue > brahms_cas_short prio=50 like brahms_cas with 1hr cpu time limit > brahms_crs prio=30 like brahms_cas, but on CRS nodes with > different load scheduling requirement Perhaps we should take out the higher priority of the bramreco user on the CAS nodes, and then give individual access to the CRS farm to the people that currently have access to the bramreco user. In that way, we still have accounting, and everythings handle the same way - via LSF. > I (opfer Rind) will be turning the LSF daemons on for the CRS nodes, > but the queue is currently closed. Flemming can activate the queue > at his discretion." Will you open the quueu for (limited) access? Yours, ___ | Christian Holm Christensen |_| | ------------------------------------------------------------- | | Address: Sankt Hansgade 23, 1. th. Phone: (+45) 35 35 96 91 _| DK-2200 Copenhagen N Cell: (+45) 24 61 85 91 _| Denmark Office: (+45) 353 25 305 ____| Email: cholm@nbi.dk Web: www.nbi.dk/~cholm | |
This archive was generated by hypermail 2.1.5 : Sun Apr 06 2003 - 09:07:52 EDT