Re: LSF queues on RCF

From: Christian Holm Christensen (cholm@hehi03.nbi.dk)
Date: Sun Apr 06 2003 - 09:06:57 EDT

  • Next message: Kris Hagel: "Fwd: [Rhic-software-l] rcas0005 name change"
    Hi Flemming et al,
    
    "Flemming Videbaek" <videbaek@sgs1.hirg.bnl.gov> wrote concerning
      LSF queues on RCF [Thu, 3 Apr 2003 15:46:31 -0500] 
    ----------------------------------------------------------------------
    > I requested a change to the queueing systems for our machines in
    > part based on the experience we had with  temporary overloading of
    > the queue. It is certainly possible to expand on this as more
    > experienced is gained. 
    > 
    > The changes are summarized below. The main change is really that
    > they are fairshare while it was setup as first-come first serve. 
    
    I guess that means that each of us have a certain amount of credit,
    and we use that credit as we submit jobs.  We gain credit by a general
    reset each day, week, month, or so.  Is that correctly understood? 
    
    > The will prevent quees from being monopolized by a single user. For
    > the purpose of production analysis the bramreco user 
    > will have a priority 3* that of others (this works well for
    > start). 
    
    Perhaps we should list on some web-page just exactly who has access to
    that account.  Here's the ones I know about: 
    
      Flemming (of course)
      Eun-Joo 
      Steve 
      Hiro
      Ian 
      myself 
    
    The reason I think that info need to be available, is that the higher
    priority of that user effectively circumvents the normal accounting
    rules that means one can find the right person to blame with very
    little hassle. 
    
    > The short high prio queue is intended for calibration runs,  
    > dst generations  etc while the regular cas should be used for longer 
    > running jobs (acceptance maps, monte carlo). 
    
    What's the time-limit on those queues?  How long does a XXX run take? 
    
    > "The Brahms LSF queues have been reconfigured as requested by
    > Flemming and their CRS nodes have been included in a new, separate
    > queue.  
    
    CRS nodes in the LSF queu? Cool!  Hopefully that means we'll exploit
    that farm a little better in the future.  After all those nodes are
    pretty idle has it is now.  Perhaps we should of making some scripts
    that would sink HPSS files from an LSF batch job, to test my
    hypothesis that the LSF batch software is  better than the
    hand-written Perl scripts and MySQL backend that is currently used at
    the CRS farm.  Perrhaps we should also think about running PROOF on
    that farm. 
    
    > The queues are  summarized as follows:
    > 
    > brahms_cas       prio=30 standard fairshare queue
    > brahms_cas_short prio=50 like brahms_cas with 1hr cpu time limit
    > brahms_crs       prio=30 like brahms_cas, but on CRS nodes with
    >                          different load scheduling requirement 
    
    Perhaps we should take out the higher priority of the bramreco user on
    the CAS nodes, and  then give individual access to the CRS farm to the
    people that currently have access to the bramreco user.  In that way,
    we still have accounting, and everythings handle the same way - via
    LSF. 
    
    > I (opfer Rind) will be turning the LSF daemons on for the CRS nodes,
    > but the queue is  currently closed.  Flemming can activate the queue
    > at his discretion." 
    
    Will you open the quueu for (limited) access? 
    
    Yours, 
    
     ___  |  Christian Holm Christensen 
      |_| |	 -------------------------------------------------------------
        | |	 Address: Sankt Hansgade 23, 1. th.  Phone:  (+45) 35 35 96 91
         _|	          DK-2200 Copenhagen N       Cell:   (+45) 24 61 85 91
        _|	          Denmark                    Office: (+45) 353  25 305
     ____|	 Email:   cholm@nbi.dk               Web:    www.nbi.dk/~cholm
     | |
    


    This archive was generated by hypermail 2.1.5 : Sun Apr 06 2003 - 09:07:52 EDT