Re: Home on crs CRASH updates

From: Christian Holm Christensen (cholm@hehi03.nbi.dk)
Date: Sat Nov 17 2001 - 09:49:24 EST

Next message: jnorris: "supermon output"

Previous message: Flemming Videbaek: "Home on crs CRASH updates"
In reply to: Flemming Videbaek: "Home on crs CRASH updates"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Flemming et al, 

On Sat, 17 Nov 2001 09:11:56 -0500
"Flemming Videbaek" <videbaek@sgs1.hirg.bnl.gov> wrote
concerning "Home on crs  CRASH updates":
> If $HOME in fact points to different places on the crs nodes - and I
> have no doubt it does it means that the RCF philosofy is broken on
> the farms. The intentention is to haive all machine look identitcal
> in terms of software setup. Christian or some-one else can you give
> specifci cases and we will talk to Tony - as well as discussion 
> the second (afs) issue

What I did, was to submit the job 

   ~bramreco/tests/query_env.jsf

which executes 
  
   ~bramreco/tests/query_env.sh 

on a single node (which ever it might be).  The output is stored in
three files  

   ~bramreco/tests/query_env.out     (standard out) 
   ~bramreco/tests/query_env.err     (standard error) 
   ~bramreco/tests/query_env.tree    (ls -R on /afs/rhic/opt/brahms/[pro|new]) 
   ~bramreco/tests/query_env.running (empty)

Among other things, the query_env.sh prints the environment on the
node of execution. Here's a snippet of the output (the rest can be
found in the files above):
  
  First the environment variables
  ===============================
  LOGNAME=bramreco
  MACHTYPE=i386
  CRS_JOB_FILE=query_env.jsf
  HOSTTYPE=i386-linux
  PATH=/bin:/usr/bin:/usr/local/bin:/usr/bin/X11
  HOME=/home/bramreco
  SHELL=/bin/tcsh
  USER=bramreco
  HOST=rcrs0032.rcf.bnl.gov
  ...

  AFS Sysnname: Current sysname is 'i386_redhat61'
  ...

As you can see, HOME=/home/bramreco and _not_ /brahms/u/bramreco!  I
know the intention is to have a hetrogenious environment, but this
really does not bother me too much though - others may feel
different ofcourse.  

If you want to run the test, just execute 

   ~bramreco/tests/query_env.doit

It will clean the old output files, and keep looping until the job has
executed, printing the progess. 

On the AFS thing.  I think you're likely to get the reponse from Tony,
that the Farm is not supposed to use AFS in the first place.  This is
what I got from Tony on July 10, 2001 in reply to some other problem. 

  I found your "staging failed" job in the log files for
  the CRS jobs. It looks like some of your input files
  are being loaded from AFS. I told Flemming yesterday
  that only input files located in the following areas
  can run successfully:

    /brahms/u
    /brahms/data01, data02, data06

  The CRS software checks for the existence of the input
  files, and AFS files are not allowed. Can you move your 
  AFS input files to one of the above disks and try again?

Now, /afs/rhic/opt/brahms/new/lib/bratmain.wrapper is not listed as an
input stream in the JSF files, so it isn't checked for existence
(otherwise we'd not get the error message), and I guess it isn't
checked if the executable lives on AFS (otherwise, we chouldn't have
used the farm as it is now).  When I asked Tony wether we could allow
input files from AFS disks, I got this reply (July 11, 2001): 

  The RCF model agreed by the experiments a long time ago is that
  input files come from either HPSS or from our central disk storage
  area (which is NFS-mounted on the Linux nodes). That's why we only
  check the origin of the input files. AFS binaries work (we don't
  stop them), but we don't recommend it. 

Interlude:  there you have it from the horses mouth - executables on
AFS is OK. Tony goes on: 

  The CRS software is very complicated, and it depends on HPSS,
  network and NFS. By adding AFS-dependency, you are just making it
  more complicated and more prone to potential problems. I don't see
  how this is an improvement.  Our experience has shown that simpler
  systems are more reliable. 

Interlude:  the reason why we like to have the software on AFS, is so
that we only have to maintain _one_ installation for the rcas', rcrs',
piis and other potential users in the US (AFS over the atlantic is a
hassle).  

As Tony says, simple systems are good - that's also really what we
want, the simplicity of maintaining _one_ installation. Ofcourse we
could install all our software on an NFS disk, and mount that disk on
all rcas', rcrs', rmines, and piis, but people outside BNL would then
not have those installations avaliable, and we wouldn't have the
benefiet of the '@sys' feature on AFS.  To me, being across the
atlantic, and therefor not using the AFS installations, I could
defently live with moving our stuff to NFS. 

Another possiblity, if technically possible, would be to NFS mount the
AFS disks on the rcrs nodes, so that /afs/rhic/opt/brahms is infact a
mount of the AFS disks for i386_linux - not using AFS. 

Now back to Tony:

  In any case, the CRS software is the same for all 4 experiments  
  (BRAHMS, PHENIX, PHOBOS, STAR). We don't customize the software 
  for any experiment. Any changes you request has to be made by 
  your official RCF Liaison (Flemming or Betty) and presented to 
  the RCF and to the other experiments. 

Interlude: Flemming and Betty, I guess this really puts the ball on
your side :-) And Tony continues: 

  If no one objects, a new version is created, and it is extensively
  tested by everyone before it becomes the "production version". The
  coding and testing typically takes weeks, so I think the chance of
  your requested change becoming the production version for the
  current run is very low, although I wouldn't rule it out for the
  next run. 

Which is in a weeks time? I mean, the current run ends soon
(disregarding the p+p run), and so I guess there'll be time to do
structual changes.  

I hope this helps you resolve matters. 

Yours, 

Christian Holm Christensen -------------------------------------------
Address: Sankt Hansgade 23, 1. th.           Phone:  (+45) 35 35 96 91 
         DK-2200 Copenhagen N                Cell:   (+45) 28 82 16 23
         Denmark                             Office: (+45) 353  25 305 
Email:   cholm@nbi.dk                        Web:    www.nbi.dk/~cholm

Next message: jnorris: "supermon output"
Previous message: Flemming Videbaek: "Home on crs CRASH updates"
In reply to: Flemming Videbaek: "Home on crs CRASH updates"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b30 : Sat Nov 17 2001 - 09:50:03 EST