Re: calibrations on rcas

From: Christian Holm Christensen (cholm@hehi03.nbi.dk)
Date: Fri Mar 29 2002 - 12:12:03 EST

  • Next message: Stephen J. Sanders: "Re: calibrations on rcas"

    Hi Steve, 
    
    On Fri, 29 Mar 2002 10:34:08 -0600
    "Stephen J. Sanders" <ssanders@ku.edu> wrote
    concerning "calibrations on rcas":
    > Hi,
    > I'm trying to get si/tile calibration done on rcas0019.  Unfortunately,
    > rcf has apparently implemented a time-out program that drops
    > my ssh connection before I can complete a replay. I don't know if
    > this is new, but I never encountered the problem before this week.
    > So...  I am trying to run bratmain in batch.  My problem is
    > that my in the batch environment, bratmain can't find the shared
    > library that contains my replay modules:
    > 
    > Error in <TUnixSystem::DynamicPathName>: libmultReplay.so does not exist 
    > in .:/afs/rhic/opt/brahms/new/lib
    > 
    > What needs to be done to point to a local library?
    
    Make sure that you have the path to your libraries in the variable 
    
      Unix.*.Root.DynamicPath
    
    in the .rootrc file in our home directory or the directory where you
    execute bratmain. 
    
    What do you mean by batch? CRS, LSF, or `nice ... &'? 
    
    For CRS (and CRASH), the documentation outlines what you need to do:
    You need to put your library in the same directory as your
    configuration script. 
    
    For LSF and `nice ...&', you need the entry in the .rootrc file. 
    
    I higly recommend you (all of you) use the LSF for long second, third,
    ladida analysis passes.  Make a file like 
    
      #!/bin/sh 
      #BSUB -q brahms_cas
      #BSUB -o <standard out output file> 
      #BSUB -e <standard error output file> 
      #BSUB -J <name of your job> 
      
      unset DISPLAY 
    
      bratmain <configuration script> [<options>] 
    
    The you can submit this to the LSF queue (brahms_cas) with 
    
      bsub < <script name> 
    
    See also man(1) pages bsub(1), bpeek(1), bjobs(1), bhosts(1), and so
    on.  There's some documentation at [1] - see in particular the `Quick
    Start Guide' and the reference card at [2]. 
    
    If you need to submit many jobs, I suggest you make a script like 
    
      #!/bin/sh 
      
      logdir=${HOME}/log 
      outdir=${HOME}/out 
      indir=${HOME}/in 
      config=${HOME}/config.C 
      user=`whoami` 
    
      runs="$*" 
      
      for run in $runs ; do 
    
        # Write a temporary script  
        cat > tmp.sh <<EOF    
      #!/bin/sh 
      #BSUB -q brahms_cas
      #BSUB -J ${user}_$run 
      #BSUB -o ${logdir}/${run}.out
      #BSUB -e ${logdir}/${run}.err 
    
      set -e
      unset DISPLAY 
    
      bratmain ${config}                     \
               -r $run                       \
               -o ${outdir}/out${run}.root   \
    	   -H ${outdir}/hist${run}.root  \
               -i ${indir}/in${run}.root     \
               -v 5 
      EOF
      
        # Submit the temporary script 
        bsub < tmp.sh 
    
        # remove the temporary script
        rm -f tmp.sh 
    
      done 
    
    Then you can submit a number of runs to the queue doing 
    
      ./mylsfsubmit <runs> 
    
    The jobs will be queued and executed as soon as a processor on the CAS
    machines is avaliable.  Notice, that the jobs have a high nicity (low
    priority), and will be preempted (pushed off the processor) if a
    normal program is started by a user on the same CAS node - hence, the
    use of LSF is `behaving nicely'.   Each node in the CAS farm can at
    most run two LSF jobs. LSF will automatically choose the fastest
    avaliable CPU for the job execution.  Notice, that /home/... is
    different for each machine.  If you have loads of disk I/O, then you
    may want to make a directory in /home/`whoami` and output stuff there,
    and then copy the resulting file to ${HOME} when done: 
    
     
      #!/bin/sh 
      #BSUB -q brahms_cas
      #BSUB -o <standard out output file> 
      #BSUB -e <standard error output file> 
      #BSUB -J <name of your job> 
      
      unset DISPLAY 
    
      if test ! -d /home/`whoami`/lsfwork ; then 
        mkdir -p /home/`whoami`/lsfwork
      fi 
      cd /home/`whoami`/lsfwork
    
      bratmain <configuration script> [<options>] 
    
      cp <output> ${HOME}/out
    
    
    I cannot stress how much I recommend this kind of batch processing.
    LSF is a very clever piece of software that will most probably do the
    job you need faster than anything else.  
    
    Yours,
    
    Christian Holm Christensen -------------------------------------------
    Address: Sankt Hansgade 23, 1. th.           Phone:  (+45) 35 35 96 91 
             DK-2200 Copenhagen N                Cell:   (+45) 28 82 16 23
             Denmark                             Office: (+45) 353  25 305 
    Email:   cholm@nbi.dk                        Web:    www.nbi.dk/~cholm
    
    
    [1] http://www.rhic.bnl.gov/RCF/UserInfo/Software/LSF/
    [2] http://www.platform.com/services/support/docs/lsfdoc42/pdf/manuals/lsf_4.2_qrefcard.pdf
    



    This archive was generated by hypermail 2b30 : Fri Mar 29 2002 - 12:12:50 EST