Hi Flemming et al, On Sat, 17 Nov 2001 09:11:56 -0500 "Flemming Videbaek" <videbaek@sgs1.hirg.bnl.gov> wrote concerning "Home on crs CRASH updates": > If $HOME in fact points to different places on the crs nodes - and I > have no doubt it does it means that the RCF philosofy is broken on > the farms. The intentention is to haive all machine look identitcal > in terms of software setup. Christian or some-one else can you give > specifci cases and we will talk to Tony - as well as discussion > the second (afs) issue What I did, was to submit the job ~bramreco/tests/query_env.jsf which executes ~bramreco/tests/query_env.sh on a single node (which ever it might be). The output is stored in three files ~bramreco/tests/query_env.out (standard out) ~bramreco/tests/query_env.err (standard error) ~bramreco/tests/query_env.tree (ls -R on /afs/rhic/opt/brahms/[pro|new]) ~bramreco/tests/query_env.running (empty) Among other things, the query_env.sh prints the environment on the node of execution. Here's a snippet of the output (the rest can be found in the files above): First the environment variables =============================== LOGNAME=bramreco MACHTYPE=i386 CRS_JOB_FILE=query_env.jsf HOSTTYPE=i386-linux PATH=/bin:/usr/bin:/usr/local/bin:/usr/bin/X11 HOME=/home/bramreco SHELL=/bin/tcsh USER=bramreco HOST=rcrs0032.rcf.bnl.gov ... AFS Sysnname: Current sysname is 'i386_redhat61' ... As you can see, HOME=/home/bramreco and _not_ /brahms/u/bramreco! I know the intention is to have a hetrogenious environment, but this really does not bother me too much though - others may feel different ofcourse. If you want to run the test, just execute ~bramreco/tests/query_env.doit It will clean the old output files, and keep looping until the job has executed, printing the progess. On the AFS thing. I think you're likely to get the reponse from Tony, that the Farm is not supposed to use AFS in the first place. This is what I got from Tony on July 10, 2001 in reply to some other problem. I found your "staging failed" job in the log files for the CRS jobs. It looks like some of your input files are being loaded from AFS. I told Flemming yesterday that only input files located in the following areas can run successfully: /brahms/u /brahms/data01, data02, data06 The CRS software checks for the existence of the input files, and AFS files are not allowed. Can you move your AFS input files to one of the above disks and try again? Now, /afs/rhic/opt/brahms/new/lib/bratmain.wrapper is not listed as an input stream in the JSF files, so it isn't checked for existence (otherwise we'd not get the error message), and I guess it isn't checked if the executable lives on AFS (otherwise, we chouldn't have used the farm as it is now). When I asked Tony wether we could allow input files from AFS disks, I got this reply (July 11, 2001): The RCF model agreed by the experiments a long time ago is that input files come from either HPSS or from our central disk storage area (which is NFS-mounted on the Linux nodes). That's why we only check the origin of the input files. AFS binaries work (we don't stop them), but we don't recommend it. Interlude: there you have it from the horses mouth - executables on AFS is OK. Tony goes on: The CRS software is very complicated, and it depends on HPSS, network and NFS. By adding AFS-dependency, you are just making it more complicated and more prone to potential problems. I don't see how this is an improvement. Our experience has shown that simpler systems are more reliable. Interlude: the reason why we like to have the software on AFS, is so that we only have to maintain _one_ installation for the rcas', rcrs', piis and other potential users in the US (AFS over the atlantic is a hassle). As Tony says, simple systems are good - that's also really what we want, the simplicity of maintaining _one_ installation. Ofcourse we could install all our software on an NFS disk, and mount that disk on all rcas', rcrs', rmines, and piis, but people outside BNL would then not have those installations avaliable, and we wouldn't have the benefiet of the '@sys' feature on AFS. To me, being across the atlantic, and therefor not using the AFS installations, I could defently live with moving our stuff to NFS. Another possiblity, if technically possible, would be to NFS mount the AFS disks on the rcrs nodes, so that /afs/rhic/opt/brahms is infact a mount of the AFS disks for i386_linux - not using AFS. Now back to Tony: In any case, the CRS software is the same for all 4 experiments (BRAHMS, PHENIX, PHOBOS, STAR). We don't customize the software for any experiment. Any changes you request has to be made by your official RCF Liaison (Flemming or Betty) and presented to the RCF and to the other experiments. Interlude: Flemming and Betty, I guess this really puts the ball on your side :-) And Tony continues: If no one objects, a new version is created, and it is extensively tested by everyone before it becomes the "production version". The coding and testing typically takes weeks, so I think the chance of your requested change becoming the production version for the current run is very low, although I wouldn't rule it out for the next run. Which is in a weeks time? I mean, the current run ends soon (disregarding the p+p run), and so I guess there'll be time to do structual changes. I hope this helps you resolve matters. Yours, Christian Holm Christensen ------------------------------------------- Address: Sankt Hansgade 23, 1. th. Phone: (+45) 35 35 96 91 DK-2200 Copenhagen N Cell: (+45) 28 82 16 23 Denmark Office: (+45) 353 25 305 Email: cholm@nbi.dk Web: www.nbi.dk/~cholm
This archive was generated by hypermail 2b30 : Sat Nov 17 2001 - 09:50:03 EST