Alors Mr. Ouerdane, On Tue, 30 Apr 2002 16:43:15 +0200 (CEST) Djamel Ouerdane <ouerdane@nbi.dk> wrote concerning "DC problem": > Hi again, > > I noticed a weird behaviour with the new DC rdo module. A few sequences > crash on the crs farm with this error message: > > Error: Symbol BrDcRdoModule is not defined in current scope > FILE:/brahms/u/ouerdane/reduce/ProductionReduction.C LINE:252 > > I would say that 4/5 of the sequences are reduced. Any reason why ?? > > Djam More to follow (C-v to see it all - you are using Emacs for reading your mail aren't you? - what M$ Outlook, Pine, Netscape - you sick bastard - get a life :-) On Tue, 30 Apr 2002 17:53:17 +0200 (CEST) Djamel Ouerdane <ouerdane@nbi.dk> wrote concerning "DC stuff again": > Pawel and I investigated a bit more the job log files: > > Look at the root and brat versions. This sequence ran ok: > > ********************************************* > > B R A T M A I N > > BRAT Version: 2.03/08 > BRAT Date: Wed Jan 16 20:46:31 2002 > ROOT version: 3.03/04 > Host: rcrs0017.rcf.bnl.gov > Started: Tue Apr 30 09:37:06 2002 > Script: /brahms/u/ouerdane/reduce/ProductionReduction.C > > ********************************************* > Old version of BRAT - hmm. > > Thsi sequence crashed: ^^^^ |||| Use M-$ for spell check (in Emacs ofcourse :-) Ok, I'm not very good at doing that so, you know, it's like a, erh, ... thing. > ********************************************* > > B R A T M A I N > > BRAT Version: 2.03/01 > BRAT Date: Fri Jan 18 11:28:07 2002 > ROOT version: 3.03/02 > Host: rcrs0011.rcf.bnl.gov > Started: Tue Apr 30 09:41:42 2002 > Script: /brahms/u/ouerdane/reduce/ProductionReduction.C > > ********************************************* > > > My question is : how can there be different versions of brat and > root selected at run time when PATH and LD_LIBRARY_PATH are the same > in each case ?? You mean BRAT and ROOT (capitals) don't yea? Now for the serious part of it. As you all probably know, our `canonical' installations of BRAT, BREG, BRAG, BROP, CRASH, and ROOT are sitting on that not-so-stable part of the RCF systems that is under the control of AFS. While AFS has many advantages, like: * Transparent Cross-platform Directory Structures - which makes it easy to maintain installations for various platforms (think about the @sys feature of AFS). * (Almost) World Visiblity - which means you only have to maintain one installation for each platform for the whole world. it also has some rather unfortunate features like: * Slow as s**t. Suppose you're sitting on the other side of the pond from where the server is sitting - then you have to transport all of the bits of the filesystem across you're 100kb/s connection before you can do anything. To save diski access, remote access, etc. AFS caches most (if not all files) - this means the second time you access a file while the cache is up-to-date you have faster access. * The cache is too clever. Sometimes it doesn't synchronise with the remote often enough. This results in that the files accessed are the ones in the cache - and they may be terrible out of date. The later is what happend in yopur case - AFS on rcrs0011.rcf.bnl.gov had cached the BRAT libraries etc.. Hence, when the dynamic loader `ld.so' asked for the file `libBratRdoModules.so.2' it got the old one that was sitting in the cache. That can also happen on a regular system - but there there's a remidy - just issue a `sync' command. The situation is ofcourse not improved by the fact that you're executing a program on CRS and you have no direct access to those machines, and so you can not login and force a `sync'. Finally, there had been an update of BRAT to the new tree, that updated the minor version of BRAT, but the old installation wasn't uninstalled first (as it should). That meant that AFS thought that the symbolic links `libBrat*.so.2' hadn't been changed, and hence the libraries was the same - which is flambouantly wrong - and so the program will crash as it depends on the new libraries. So cut an already long story short: * AFS is at fault * A poor update of BRAT is making things harder. Solution: * AFS caches should be flushed. * BRAT updates should be done properly. Yours, ____ | Christian Holm Christensen |_| | ------------------------------------------------------------- | | Address: Sankt Hansgade 23, 1. th. Phone: (+45) 35 35 96 91 _| DK-2200 Copenhagen N Cell: (+45) 24 61 85 91 _| Denmark Office: (+45) 353 25 305 ____| Email: cholm@nbi.dk Web: www.nbi.dk/~cholm | | Emacs is the only modern operating system which isn't multithreaded.
This archive was generated by hypermail 2b30 : Thu May 02 2002 - 10:29:22 EDT