Hi Selemon, I actually also do it this way; the parameters are set in my scripts just before the Process( for the TDset. When I got this to work consulting with the root people, I tried multiple things, so this is why I set the scripts possible too many places, of which not all are effective. Flemming -------------------------------------------- Flemming Videbaek Physics Department Bldg 510-D Brookhaven National Laboratory Upton, NY11973 phone: 631-344-4106 cell: 631-681-1596 fax: 631-344-1334 e-mail: videbaek @ bnl gov ----- Original Message ----- From: "Bekele, Selemon" <bekeleku_at_ku.edu> To: "Bekele, Selemon" <bekeleku_at_ku.edu>; "Flemming Videbaek" <videbaek_at_bnl.gov> Cc: "JH Lee" <jhlee_at_bnl.gov>; <brahms-dev-l_at_lists.bnl.gov> Sent: Thursday, September 20, 2007 2:00 PM Subject: RE: proof > > > Hi, > > doing > > Long_t maxSlavePerNode = 9999; > gProof->SetParameter("PROOF_MaxSlavesPerNode",maxSlavePerNode); > > after (instead of before) > > gProof->UploadPackage("BratLibrary.par"); > gProof->EnablePackage("BratLibrary"); > > seems to work very well and a lot faster. > All the slave machines seem to be used. Now > it is clear why my proof sessions were taking > about an hour and a half to finish. > > Selemon; > > ======================= > void StartProof(Int_t MainNode){ > > TString cluster = Form("rcas00%02d",MainNode); > TString confFile = Form("proof_rcas00%02d.conf",MainNode); > //gROOT->Proof(cluster.Data(),confFile.Data()); > //fProof = new TProof(cluster.Data(),confFile.Data()); //does not work > fProof = TProof::Open(cluster.Data(),confFile.Data()); //added to work with new OS > //fProof->SetParameter("PROOF_MaxSlavesPerNode",9999); //does not compile with int > //Long_t maxSlavePerNode = 9999; > //fProof->SetParameter("PROOF_MaxSlavesPerNode",maxSlavePerNode); //was here before > //fProof->Open(cluster.Data(),confFile.Data()); > gProof->UploadPackage("BratLibrary.par"); > gProof->EnablePackage("BratLibrary"); > Long_t maxSlavePerNode = 9999; > gProof->SetParameter("PROOF_MaxSlavesPerNode",maxSlavePerNode); //use gProof instead of fProof which > //was loccal to the master node > > } > ======================== > > -----Original Message----- > From: Bekele, Selemon > Sent: Thu 9/20/2007 12:35 PM > To: Flemming Videbaek > Cc: JH Lee; brahms-dev-l_at_lists.bnl.gov > Subject: RE: proof > > > Hi Flemming, > > I have only one > TDSet * set; > set->Process("selector","",..) > > per process. > > Below is a function called to start the proof session: > > ================ > void StartProof(Int_t MainNode){ > > TString cluster = Form("rcas00%02d",MainNode); > TString confFile = Form("proof_rcas00%02d.conf",MainNode); > //gROOT->Proof(cluster.Data(),confFile.Data()); > //fProof = new TProof(cluster.Data(),confFile.Data()); //does not work > fProof = TProof::Open(cluster.Data(),confFile.Data()); //added to work with new OS > //fProof->SetParameter("PROOF_MaxSlavesPerNode",9999); //does not compile with int > Long_t maxSlavePerNode = 9999; > fProof->SetParameter("PROOF_MaxSlavesPerNode",maxSlavePerNode); > //fProof->Open(cluster.Data(),confFile.Data()); > gProof->UploadPackage("BratLibrary.par"); > gProof->EnablePackage("BratLibrary"); > } > ================ > > Selemon, > > -----Original Message----- > From: Flemming Videbaek [mailto:videbaek_at_bnl.gov] > Sent: Thu 9/20/2007 11:58 AM > To: Bekele, Selemon > Cc: JH Lee; brahms-dev-l_at_lists.bnl.gov > Subject: proof > > Hiw > > I would really like to know how you access the nodes and run i.e. do you have many TDSet * set; set->Process("selector","",..) > in a sequence. I have seen that such can increase in running processes. I also see that in the session running right now- > only the process on 62 gets any cpu time. Where when do you do the _>SetParameter("PROOF_MaxSlavesPerNode .. ? > > It does look peciluar. I do know that not all memory is released at the end of a Process() from the slaves. > > Flemming > > > > > > -------------------------------------------- > Flemming Videbaek > Physics Department > Bldg 510-D Where/when do you set the > Brookhaven National Laboratory > Upton, NY11973 > > phone: 631-344-4106 > cell: 631-681-1596 > fax: 631-344-1334 > e-mail: videbaek @ bnl gov > ----- Original Message ----- > From: "Bekele, Selemon" <bekeleku_at_ku.edu> > To: "Flemming Videbaek" <videbaek_at_bnl.gov> > Cc: "JH Lee" <jhlee_at_bnl.gov>; <brahms-dev-l_at_lists.bnl.gov> > Sent: Thursday, September 20, 2007 12:51 PM > Subject: RE: [Brahms-dev-l] analysis meeting > > > > > Hi Flemming, > > I have been monitoring my proof sessions for memory size. > about an hour and 20 minutes into the sessions: > > the memory size on the slaves (63 - 68) is about 235 MB. > the memory size on the slaves on 62 is about 2 GB. > > Is there any reason why the memory size on the 62 slaves should grow > to about 10 times those on the other slave machines? Uneven sharing of > loads between the slaves? > > Selemon, > > ============================ > > rcas0062: > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > tigist 26091 1.3 2.7 127924 56976 ? Ss 11:19 0:56 /opt/brahms/pro/bin/proofserv.exe proofserv > tigist 26577 68.7 40.9 2108652 849448 ? Rs 11:20 47:53 /opt/brahms/pro/bin/proofserv.exe proofslave > tigist 26578 68.2 40.6 2053872 843488 ? Ds 11:20 47:30 /opt/brahms/pro/bin/proofserv.exe proofslave > > rcas0063: > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > tigist 14298 0.0 9.8 235092 203924 ? Ss 11:20 0:04 /opt/brahms/pro/bin/proofserv.exe proofslave > tigist 14300 0.0 9.8 234548 203928 ? Ss 11:20 0:04 /opt/brahms/pro/bin/proofserv.exe proofslave > > rcas0064: > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > tigist 15736 0.0 9.8 233564 203924 ? Ss 11:20 0:04 /opt/brahms/pro/bin/proofserv.exe proofslave > tigist 15737 0.0 9.8 235480 203924 ? Ss 11:20 0:04 /opt/brahms/pro/bin/proofserv.exe proofslave > > rcas0065: > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > tigist 16870 0.0 9.8 234236 203928 ? Ss 11:20 0:04 /opt/brahms/pro/bin/proofserv.exe proofslave > tigist 16871 0.0 9.8 235148 203928 ? Ss 11:20 0:04 /opt/brahms/pro/bin/proofserv.exe proofslave > > rcas0066: > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > tigist 16629 0.0 9.8 234880 203920 ? Ss 11:20 0:04 /opt/brahms/pro/bin/proofserv.exe proofslave > tigist 16634 0.0 9.8 235464 203924 ? Ss 11:20 0:04 /opt/brahms/pro/bin/proofserv.exe proofslave > > rcas0067: > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > tigist 9125 0.0 9.8 233836 203932 ? Ss 11:20 0:04 /opt/brahms/pro/bin/proofserv.exe proofslave > tigist 9130 0.0 9.8 234328 203920 ? Ss 11:20 0:03 /opt/brahms/pro/bin/proofserv.exe proofslave > > rcas0068: > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > tigist 2143 0.0 9.8 235068 203928 ? Ss 11:20 0:04 /opt/brahms/pro/bin/proofserv.exe proofslave > tigist 2145 0.0 9.8 234620 203928 ? Ss 11:20 0:04 /opt/brahms/pro/bin/proofserv.exe proofslave > > -----Original Message----- > From: Flemming Videbaek [mailto:videbaek_at_bnl.gov] > Sent: Tue 9/18/2007 3:07 PM > To: Bekele, Selemon > Cc: JH Lee > Subject: Re: [Brahms-dev-l] analysis meeting > > Hi Selemon, > > I see you are running on rcas0062 proofserv.exe slave or maybe you are not - in anycase there memory size is 2.3Gb per process. > \Are you sure the process do not have memory leaks ? > > Flemming > > -------------------------------------------- > Flemming Videbaek > Physics Department > Bldg 510-D > Brookhaven National Laboratory > Upton, NY11973 > > phone: 631-344-4106 > cell: 631-681-1596 > fax: 631-344-1334 > e-mail: videbaek @ bnl gov > ----- Original Message ----- > From: "Bekele, Selemon" <bekeleku_at_ku.edu> > To: "Flemming Videbaek" <videbaek_at_bnl.gov>; "devlist" <brahms-dev-l_at_lists.bnl.gov> > Sent: Tuesday, September 18, 2007 1:40 PM > Subject: RE: [Brahms-dev-l] analysis meeting > > > > Hi Flemming, > > In order to make sure that I am not doing something > wrong, I run again over the 57 files individually. This > time it is a different run, 13345, that had empty histograms. > I run over the same file locally and with proof and found no > problem with the file. The error message from /var/log/ROOT.log > on rcas0055: > ======================= > Sep 17 22:40:13 rcas0055 proofserv[22320]: tigist:master-0:SysError:<TUnixSystem::DispatchOneEvent>:select: read error on 4 (Bad > file descriptor) > Sep 17 22:40:13 rcas0055 last message repeated 3 times > Sep 17 22:40:18 rcas0055 proofserv[22320]: tigist:master-0:SysError:<TUnixSystem::DispatchOneEvent>:select: read error on 4 (Bad > file descriptor) > ====================== > > Part of the log file in proof (~tigist/ProofOutPut.dat) is shown below. > it seems that a connection to some machine is reset at the very beginning of > the session. Could this be a reset of connection to a database in brahms? > > ======================= > (Int_t)(1) > (Int_t)(1) > (Int_t)(1) > (Int_t)(1) > (Int_t)(1) > (Int_t)(1) > (Int_t)(1) > (Int_t)(1) > (Int_t)(1) > (Int_t)(1) > (Int_t)(1) > (Int_t)(1) > (Int_t)(1) > (Int_t)(1) > (Int_t)(1) > > Info in <TProofServ::SetQueryRunning> on master-0: starting query: 1 > Info in <TAdaptivePacketizer::TAdaptivePacketizer> on master-0: fraction of remote files 1.000000 > SysError in <TUnixSystem::UnixSend> on master-0: send (Connection reset by peer) > SysError in <TUnixSystem::DispatchOneEvent> on master-0: select: read error on 4 > (Bad file descriptor) > > ===================== > > The problem I am facing seems to be quite random. > > As for the meeting on the coming friday, I need to resolve this issue > before since I do not have anything new after the rcf upgrades. > > Selemon, > > -----Original Message----- > From: brahms-dev-l-bounces_at_lists.bnl.gov on behalf of Flemming Videbaek > Sent: Tue 9/18/2007 9:34 AM > To: devlist > Subject: [Brahms-dev-l] analysis meeting > > We will have an analysis meeting this coming Friday. > There are planned presentation from Selemon on CuCu and on auau at 62 from Ionut > > The agenda page on indico has been setup for this meeting. > > Flemming > > -------------------------------------------- > Flemming Videbaek > Physics Department > Bldg 510-D > Brookhaven National Laboratory > Upton, NY11973 > > phone: 631-344-4106 > cell: 631-681-1596 > fax: 631-344-1334 > e-mail: videbaek @ bnl gov > > > > > > > _______________________________________________ Brahms-dev-l mailing list Brahms-dev-l_at_lists.bnl.gov https://lists.bnl.gov/mailman/listinfo/brahms-dev-lReceived on Thu Sep 20 2007 - 14:32:34 EDT
This archive was generated by hypermail 2.2.0 : Thu Sep 20 2007 - 14:35:37 EDT