CRASH updates

From: Christian Holm Christensen (cholm@hehi03.nbi.dk)
Date: Sat Nov 17 2001 - 08:33:32 EST

  • Next message: Flemming Videbaek: "crs status"

    Hi all, 
    
    I made some changes to CRASH, and installed the new version on AFS. 
    
      Version: 1.2.3
      CVS tag: CRASH-1-2-3 
    
    The most important changes: 
    
    * When one wants to connect to the databases, you need to read
      ${HOME}/.bratdbrc, since you have no way of interactively type in
      the password on the farm.   Now, /brahms/u/bramreco/.bratdbrc does
      exist, but on some of the nodes on the farm (if not all), ${HOME}
      points to /home/bramreco.  Hence we need to copy
      /brahms/u/bramreco/.bratdbrc to /home/bramreco/.bratdbrc.  I think
      this change of home directory is a fairly recent thing, since it
      used to work with /brahms/u/bramreco/.bratdbrc.  
    
    * The notify flag (-n/--notify) now works!  It turned out to be a
      really stupid mistake on my part.  It can even send mail to
      addresses out side of RCF e.g., your home email.  In that way, it
      should be much easier for users of the farm to get a feel of how the 
      jobs went, and the mailbox of bramreco shouldn't be filled. 
    
      So I'd like to suggest thatm, when someone submits jobs on behalf of
      others, they use the -n/--notify option to send emails pertaining to
      the job, to the iser that requested the job. 
    
    There still seems to be a problem, that I don't really understand.
    Sometimes the jobs crashes, and the .err file contains the line 
    
      sh: /afs/rhic/opt/brahms/new/lib/crash/bratmain.wrapper: No such file or directory
    
    Which is odd, since that file does infact exist.  What is even more
    odd, is that it somethimes happens on the same node that sh first
    cannot find the file, and later on it can!  See for example 
    
      /brahms/data03/crash/bm/log/run004640seq030.[out,err] (succeded) 
      /brahms/data03/crash/bm/log/run004640seq009.err       (failed) 
    
    My initial feeling was, that it had to do with the AFS cache on the
    nodes, but now I'm not so sure.  I wonder if it has anything to do
    with the increased number of crashes that Tony Chan was talking about.
    Any ideas are more than welcome. 
    
    >From the changelog:
    -------------------
    2001-11-16  Brahms Software Librarian  <brahmlib@rcf.rhic.bnl.gov>
    
    	* configure.in, scripts/bmsubmit.in: Fix and revision
    
    2001-11-14  Brahms Software Librarian  <brahmlib@rcf.rhic.bnl.gov>
    
    	* configure.in: Bump revision
    
    	* scripts/bratmain.wrapper.in: fix to home dir
    
    	* scripts/bratmain.wrapper.in:
    	Fix if $HOME != /brahms/u/bramreco, so that /brahms/u/bramreco/.bratdbrc is
    	copied to ${HOME}.
    
    	* config/brat.m4, config/debug.m4, config/root.m4:
    	Moved Autoconf macros to acinclude.m4
    
    	* acinclude.m4, configure.in: Bumped minor version number
    	Moved Autoconf macros to acinclude.m4
    
    Yours, 
    
    Christian Holm Christensen -------------------------------------------
    Address: Sankt Hansgade 23, 1. th.           Phone:  (+45) 35 35 96 91 
             DK-2200 Copenhagen N                Cell:   (+45) 28 82 16 23
             Denmark                             Office: (+45) 353  25 305 
    Email:   cholm@nbi.dk                        Web:    www.nbi.dk/~cholm
    



    This archive was generated by hypermail 2b30 : Sat Nov 17 2001 - 08:33:46 EST