Hi Selemon, I belive that proof session die when they reach an Error, even if the local jobs run. Under some circumstances I have seen this. The condor jobs running is in my opinion a red heering in regard to proof, though there are other issues with that. Flemming -------------------------------------------- Flemming Videbaek Physics Department Bldg 510-D Brookhaven National Laboratory Upton, NY11973 phone: 631-344-4106 cell: 631-681-1596 fax: 631-344-1334 e-mail: videbaek @ bnl gov ----- Original Message ----- From: "Bekele, Selemon" <bekeleku_at_ku.edu> To: "Flemming Videbaek" <videbaek_at_bnl.gov> Cc: "devlist" <brahms-dev-l_at_lists.bnl.gov> Sent: Saturday, May 05, 2007 2:09 PM Subject: RE: [Brahms-dev-l] Proof sessions Hi Flemming, I have the run the code locally and saw no problem with the code. I have ralso un with proof with all events from all files for the setting 90B350 the whole afternoon yesterday and saw no problems. As for the errors below, I will clean up my code but they do not seem to matter. Looking at the PROOF progress monitor, the second session is stuck when only 0.8 seconds are left for the session to finish. I still think that whenever another cpu intensive job is running on the PROOF nodes, the sessions are suspended. Selemon, -----Original Message----- From: Flemming Videbaek [mailto:videbaek_at_bnl.gov] Sent: Sat 5/5/2007 12:03 PM To: Bekele, Selemon Cc: devlist Subject: Re: [Brahms-dev-l] Proof sessions Hi Selemon, The problem is with the files and/or code used. When I look at /var/log/ROOT.log I see -- i.e. has nothing to do with condor. Try to run your session with just a few 1000' events and get that to work before submitting. I also recommend you go to each node and kill the proofserv (3 on 41, two on subsequent nodes) Flemming > FL.fSi1bMult May 4 22:09:58 rcas0041 proofslave[9165]: tigist:slave 0.0:Error:<TTree::SetBranchAddress>:unknown branch -> FL.fSi1cMult May 4 22:09:59 rcas0041 proofslave[9165]: tigist:slave 0.0:Error:<TTree::SetBranchAddress>:unknown branch -> FL.fSi1dMult May 4 22:09:59 rcas0041 proofslave[9165]: tigist:slave 0.0:Error:<TTree::SetBranchAddress>:unknown branch -> FL.fSi1eMult May 4 22:09:59 rcas0041 proofslave[9165]: tigist:slave 0.0:Error:<TTree::SetBranchAddress>:unknown branch -> FL.fSi1fMult May 4 22:09:59 rcas0041 proofslave[9165]: tigist:slave 0.0:Error:<TTree::SetBranchAddress>:unknown branch -> FL.fSi1gMult May 4 22:09:59 rcas0041 proofslave[9165]: tigist:slave 0.0:Error:<TTree::SetBranchAddress>:unknown branch -> FL.fSi1aEta May 4 22:09:59 rcas0041 proofslave[9165]: tigist:slave 0.0:Error:<TTree::SetBranchAddress>:unknown branch -> FL.fSi1bEta May 4 22:09:59 rcas0041 proofslave[9165]: tigist:slave 0.0:Error:<TTree::SetBranchAddress>:unknown branch -> FL.fSi1cEta May 4 22:09:59 rcas0041 proofslave[9165]: tigist:slave 0.0:Error:<TTree::SetBranchAddress>:unknown branch -> FL.fSi1dEta May 4 22:09:59 rcas0041 proofslave[9165]: tigist:slave 0.0:Error:<TTree::SetBranchAddress>:unknown branch -> FL.fSi1eEta May 4 22:09:59 rcas0041 proofslave[9165]: tigist:slave 0.0:Error:<TTree::SetBranchAddress>:unknown branch -> FL.fSi1fEta May 4 22:09:59 rcas0041 proofslave[9165]: tigist:slave 0.0:Error:<TTree::SetBranchAddress>:unknown branch -> FL.fSi1gEta May 4 22:09:59 rcas0041 proofslave[9165]: tigist:slave 0.0:Error:<TTree::SetBranchAddress>:unknown branch -> FS.fVertexFlag May 4 22:10:00 rcas0041 proofslave[9165]: tigist:slave 0.0:Error:<TTree::SetBranchAddress>:unknown branch -> FFS.fVertexFlag May 4 22:10:02 rcas0041 proofslave[9165]: !!!cleanup!!! -------------------------------------------- Flemming Videbaek Physics Department Bldg 510-D Brookhaven National Laboratory Upton, NY11973 phone: 631-344-4106 cell: 631-681-1596 fax: 631-344-1334 e-mail: videbaek @ bnl gov ----- Original Message ----- From: "Bekele, Selemon" <bekeleku_at_ku.edu> To: <brahms-dev-l_at_lists.bnl.gov> Sent: Saturday, May 05, 2007 12:52 PM Subject: [Brahms-dev-l] Proof sessions > > Hi, > > I have been trying to run proof sessions > (6 centrality bins X 6 field settings = 30 sessions) > with the master node on rcas0041. I run the sessions > sequentially for each centrality from a shell script > and only the very first session has finished > since 9:00 PM friday night and the second session > is suspended which means the subsequent runs could > not be done. > > Doing > > rcas0041:> condor_status -claimed > > I see: > > vm1_at_rcas0041. LINUX INTEL 0.820 claudius_at_bnl.gov rcas2065.rcf.bn > vm2_at_rcas0041. LINUX INTEL 0.870 claudius_at_bnl.gov rcas2065.rcf.bn > vm1_at_rcas0042. LINUX INTEL 0.000 claudius_at_bnl.gov rcas2065.rcf.bn > vm2_at_rcas0042. LINUX INTEL 0.000 claudius_at_bnl.gov rcas2065.rcf.bn > vm1_at_rcas0043. LINUX INTEL 0.800 claudius_at_bnl.gov rcas2065.rcf.bn > vm2_at_rcas0043. LINUX INTEL 0.820 claudius_at_bnl.gov rcas2065.rcf.bn > vm1_at_rcas0044. LINUX INTEL 0.830 claudius_at_bnl.gov rcas2065.rcf.bn > vm2_at_rcas0044. LINUX INTEL 0.860 claudius_at_bnl.gov rcas2065.rcf.bn > vm1_at_rcas0045. LINUX INTEL 0.000 claudius_at_bnl.gov rcas2065.rcf.bn > vm2_at_rcas0045. LINUX INTEL 0.000 claudius_at_bnl.gov rcas2065.rcf.bn > vm1_at_rcas0046. LINUX INTEL 0.000 claudius_at_bnl.gov rcas2065.rcf.bn > vm2_at_rcas0046. LINUX INTEL 0.000 claudius_at_bnl.gov rcas2065.rcf.bn > vm1_at_rcas0047. LINUX INTEL 0.750 claudius_at_bnl.gov rcas2065.rcf.bn > vm2_at_rcas0047. LINUX INTEL 0.710 claudius_at_bnl.gov rcas2065.rcf.bn > > > It seems like the proof sessions are suspended because, > I think, someone is running cpu intensive jobs on the > BRAHMS rcas machines. I do not think changing to a different > master node would help as all the BRAHMS machines seem to be > taken. > > Has anyone faced the same problem and found a quick solution or > I just need to wait out until the machines become free? > > Selemon, > > _______________________________________________ > Brahms-dev-l mailing list > Brahms-dev-l_at_lists.bnl.gov > https://lists.bnl.gov/mailman/listinfo/brahms-dev-l > _______________________________________________ Brahms-dev-l mailing list Brahms-dev-l_at_lists.bnl.gov https://lists.bnl.gov/mailman/listinfo/brahms-dev-lReceived on Sat May 05 2007 - 14:13:37 EDT
This archive was generated by hypermail 2.2.0 : Sat May 05 2007 - 14:14:01 EDT