Re: DC stuff again

From: Christian Holm Christensen (cholm@hehi03.nbi.dk)
Date: Thu May 02 2002 - 10:28:38 EDT

  • Next message: Djamel Ouerdane: "Re: DC stuff again"

    Alors Mr. Ouerdane, 
    
    On Tue, 30 Apr 2002 16:43:15 +0200 (CEST)
    Djamel Ouerdane <ouerdane@nbi.dk> wrote
    concerning "DC problem":
    > Hi again,
    > 
    > I noticed a weird behaviour with the new DC rdo module. A few sequences 
    > crash on the crs farm with this error message:
    > 
    > Error: Symbol BrDcRdoModule is not defined in current scope  
    > FILE:/brahms/u/ouerdane/reduce/ProductionReduction.C LINE:252
    > 
    > I would say that 4/5 of the sequences are reduced. Any reason why ??
    > 
    > Djam
    
    
    More to follow (C-v to see it all - you are using Emacs for reading
    your mail aren't you?  - what M$ Outlook, Pine, Netscape - you sick
    bastard - get a life :-) 
    
    On Tue, 30 Apr 2002 17:53:17 +0200 (CEST)
    Djamel Ouerdane <ouerdane@nbi.dk> wrote
    concerning "DC stuff again":
    > Pawel and I investigated a bit more the job log files:
    > 
    > Look at the root and brat versions. This sequence ran ok:
    > 
    > *********************************************
    > 
    >               B R A T   M A I N
    > 
    >   BRAT Version: 2.03/08
    >   BRAT Date:    Wed Jan 16 20:46:31 2002
    >   ROOT version: 3.03/04
    >   Host:         rcrs0017.rcf.bnl.gov
    >   Started:      Tue Apr 30 09:37:06 2002
    >   Script:       /brahms/u/ouerdane/reduce/ProductionReduction.C
    > 
    > *********************************************
    > 
    
    Old version of BRAT - hmm. 
     
    > 
    > Thsi sequence crashed:
      ^^^^
      ||||
    
      Use M-$ for spell check (in Emacs ofcourse :-)  Ok, I'm not very
    good at doing that so, you know, it's like a, erh, ... thing.
    
    > *********************************************
    > 
    >               B R A T   M A I N
    > 
    >   BRAT Version: 2.03/01
    >   BRAT Date:    Fri Jan 18 11:28:07 2002
    >   ROOT version: 3.03/02
    >   Host:         rcrs0011.rcf.bnl.gov
    >   Started:      Tue Apr 30 09:41:42 2002
    >   Script:       /brahms/u/ouerdane/reduce/ProductionReduction.C
    > 
    > *********************************************
    > 
    > 
    > My question is : how can there be different versions of brat and 
    > root selected at run time when PATH and LD_LIBRARY_PATH are the same
    > in  each  case ??
    
    You mean BRAT and ROOT (capitals) don't yea?  
    
    
    Now for the serious part of it.  
    
    As you all probably know, our `canonical' installations of BRAT, BREG,
    BRAG, BROP, CRASH, and ROOT are sitting on that not-so-stable part of
    the RCF systems that is under the control of AFS.  
    
    While AFS has many advantages, like:
    
    * Transparent Cross-platform Directory Structures - which makes it
      easy to maintain installations for various platforms (think about
      the @sys feature of AFS). 
    
    * (Almost) World Visiblity - which means  you only have to maintain
      one installation for each platform for the whole world.
    
    it also has some rather unfortunate features like:
    
    * Slow as s**t.  Suppose you're sitting on the other side of the pond
      from where the server is sitting - then you have to transport all of
      the bits of the filesystem across you're 100kb/s connection before
      you can do anything.  To save diski access, remote access, etc. AFS
      caches most (if not all files) - this means the second time you
      access a file while the cache is up-to-date you have faster access. 
    
    * The cache is too clever.  Sometimes it doesn't synchronise with the
      remote often enough.  This results in that the files accessed are
      the ones in the cache - and they may be terrible out of date.  
    
    The later is what happend in yopur case - AFS on rcrs0011.rcf.bnl.gov
    had cached the BRAT libraries etc.. Hence, when the dynamic loader
    `ld.so' asked for the file `libBratRdoModules.so.2' it got the old one
    that was sitting in the cache.  
    
    That can also happen on a regular system - but there there's a remidy
    - just issue a `sync' command.  
    
    The situation is ofcourse not improved by the fact that you're
    executing a program on CRS and you have no direct access to those
    machines, and so you can not login and force a `sync'. 
    
    Finally, there had been an update of BRAT to the new tree, that
    updated the minor version of BRAT, but the old installation wasn't
    uninstalled first (as it should).  That meant that AFS thought that
    the symbolic links `libBrat*.so.2' hadn't been changed, and hence the
    libraries was the same - which is flambouantly wrong - and so the
    program will crash as it depends on the new libraries.   
    
    So cut an already long story short: 
    
      * AFS is at fault
      * A poor update of BRAT is making things harder. 
    
    Solution:
    
      * AFS caches should be flushed. 
      * BRAT updates should be done properly. 
    
    Yours, 
    
     ____ |  Christian Holm Christensen 
      |_| |	 -------------------------------------------------------------
        | |	 Address: Sankt Hansgade 23, 1. th.  Phone:  (+45) 35 35 96 91
         _|	          DK-2200 Copenhagen N       Cell:   (+45) 24 61 85 91
        _|	          Denmark                    Office: (+45) 353  25 305
     ____|	 Email:   cholm@nbi.dk               Web:    www.nbi.dk/~cholm
     | |
    
    Emacs is the only modern operating system which isn't multithreaded.
    



    This archive was generated by hypermail 2b30 : Thu May 02 2002 - 10:29:22 EDT