Re: [Brahms-dev-l] proof

From: Flemming Videbaek <videbaek_at_bnl.gov>
Date: Thu, 20 Sep 2007 14:30:47 -0400
Hi Selemon,

I actually also do it this way; the parameters are set in my scripts just before the Process( for the TDset.
When I got this to work consulting with the root people, I tried multiple things, so this is why I set the scripts
possible too many places, of which not all are effective.

Flemming
--------------------------------------------
Flemming Videbaek
Physics Department
Bldg 510-D
Brookhaven National Laboratory
Upton, NY11973

phone: 631-344-4106
cell:       631-681-1596
fax:        631-344-1334
e-mail: videbaek @ bnl gov
----- Original Message ----- 
From: "Bekele, Selemon" <bekeleku_at_ku.edu>
To: "Bekele, Selemon" <bekeleku_at_ku.edu>; "Flemming Videbaek" <videbaek_at_bnl.gov>
Cc: "JH Lee" <jhlee_at_bnl.gov>; <brahms-dev-l_at_lists.bnl.gov>
Sent: Thursday, September 20, 2007 2:00 PM
Subject: RE: proof


>
>
> Hi,
>
> doing
>
>  Long_t maxSlavePerNode = 9999;
>  gProof->SetParameter("PROOF_MaxSlavesPerNode",maxSlavePerNode);
>
> after (instead of before)
>
>  gProof->UploadPackage("BratLibrary.par");
>  gProof->EnablePackage("BratLibrary");
>
> seems to work very well and a lot faster.
> All the slave machines seem to be used. Now
> it is clear why my proof sessions were taking
> about an hour and a half to finish.
>
> Selemon;
>
> =======================
> void StartProof(Int_t MainNode){
>
>  TString cluster = Form("rcas00%02d",MainNode);
>  TString confFile = Form("proof_rcas00%02d.conf",MainNode);
>  //gROOT->Proof(cluster.Data(),confFile.Data());
>  //fProof = new TProof(cluster.Data(),confFile.Data());    //does not work
>  fProof = TProof::Open(cluster.Data(),confFile.Data());    //added to work with new OS
>  //fProof->SetParameter("PROOF_MaxSlavesPerNode",9999);     //does not compile with int
>  //Long_t maxSlavePerNode = 9999;
>  //fProof->SetParameter("PROOF_MaxSlavesPerNode",maxSlavePerNode);   //was here before
>  //fProof->Open(cluster.Data(),confFile.Data());
>  gProof->UploadPackage("BratLibrary.par");
>  gProof->EnablePackage("BratLibrary");
>  Long_t maxSlavePerNode = 9999;
>  gProof->SetParameter("PROOF_MaxSlavesPerNode",maxSlavePerNode);    //use gProof instead of fProof which
>                                                                     //was loccal to the master node
>
> }
> ========================
>
> -----Original Message-----
> From: Bekele, Selemon
> Sent: Thu 9/20/2007 12:35 PM
> To: Flemming Videbaek
> Cc: JH Lee; brahms-dev-l_at_lists.bnl.gov
> Subject: RE: proof
>
>
> Hi Flemming,
>
>  I have only one
>      TDSet * set;
>      set->Process("selector","",..)
>
>  per process.
>
> Below is a function called to start the proof session:
>
> ================
> void StartProof(Int_t MainNode){
>
>  TString cluster = Form("rcas00%02d",MainNode);
>  TString confFile = Form("proof_rcas00%02d.conf",MainNode);
>  //gROOT->Proof(cluster.Data(),confFile.Data());
>  //fProof = new TProof(cluster.Data(),confFile.Data());    //does not work
>  fProof = TProof::Open(cluster.Data(),confFile.Data());    //added to work with new OS
>  //fProof->SetParameter("PROOF_MaxSlavesPerNode",9999);     //does not compile with int
>  Long_t maxSlavePerNode = 9999;
>  fProof->SetParameter("PROOF_MaxSlavesPerNode",maxSlavePerNode);
>  //fProof->Open(cluster.Data(),confFile.Data());
>  gProof->UploadPackage("BratLibrary.par");
>  gProof->EnablePackage("BratLibrary");
> }
> ================
>
> Selemon,
>
> -----Original Message-----
> From: Flemming Videbaek [mailto:videbaek_at_bnl.gov]
> Sent: Thu 9/20/2007 11:58 AM
> To: Bekele, Selemon
> Cc: JH Lee; brahms-dev-l_at_lists.bnl.gov
> Subject: proof
>
> Hiw
>
> I would really like to know how you access the nodes and run i.e. do you have many TDSet * set;  set->Process("selector","",..)
> in a sequence. I have seen that such can increase in running processes. I also see that in the session running right now-
> only the process on 62 gets any cpu time. Where when do you do the _>SetParameter("PROOF_MaxSlavesPerNode .. ?
>
> It does look peciluar. I do know that not all memory is released at the end of a Process()  from the slaves.
>
> Flemming
>
>
>
>
>
> --------------------------------------------
> Flemming Videbaek
> Physics Department
> Bldg 510-D Where/when do you set the
> Brookhaven National Laboratory
> Upton, NY11973
>
> phone: 631-344-4106
> cell:       631-681-1596
> fax:        631-344-1334
> e-mail: videbaek @ bnl gov
> ----- Original Message ----- 
> From: "Bekele, Selemon" <bekeleku_at_ku.edu>
> To: "Flemming Videbaek" <videbaek_at_bnl.gov>
> Cc: "JH Lee" <jhlee_at_bnl.gov>; <brahms-dev-l_at_lists.bnl.gov>
> Sent: Thursday, September 20, 2007 12:51 PM
> Subject: RE: [Brahms-dev-l] analysis meeting
>
>
>
>
> Hi Flemming,
>
>    I have been monitoring my proof sessions for memory size.
> about an hour and 20 minutes into the sessions:
>
> the memory size on the slaves (63 - 68) is about 235 MB.
> the memory size on the slaves on 62 is about 2 GB.
>
> Is there any reason why the memory size on the 62 slaves should grow
> to about 10 times those on the other slave machines? Uneven sharing of
> loads between the slaves?
>
> Selemon,
>
> ============================
>
> rcas0062:
> USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
> tigist   26091  1.3  2.7 127924 56976 ?      Ss   11:19   0:56 /opt/brahms/pro/bin/proofserv.exe proofserv
> tigist   26577 68.7 40.9 2108652 849448 ?    Rs   11:20  47:53 /opt/brahms/pro/bin/proofserv.exe proofslave
> tigist   26578 68.2 40.6 2053872 843488 ?    Ds   11:20  47:30 /opt/brahms/pro/bin/proofserv.exe proofslave
>
> rcas0063:
> USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
> tigist   14298  0.0  9.8 235092 203924 ?     Ss   11:20   0:04 /opt/brahms/pro/bin/proofserv.exe proofslave
> tigist   14300  0.0  9.8 234548 203928 ?     Ss   11:20   0:04 /opt/brahms/pro/bin/proofserv.exe proofslave
>
> rcas0064:
> USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
> tigist   15736  0.0  9.8 233564 203924 ?     Ss   11:20   0:04 /opt/brahms/pro/bin/proofserv.exe proofslave
> tigist   15737  0.0  9.8 235480 203924 ?     Ss   11:20   0:04 /opt/brahms/pro/bin/proofserv.exe proofslave
>
> rcas0065:
> USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
> tigist   16870  0.0  9.8 234236 203928 ?     Ss   11:20   0:04 /opt/brahms/pro/bin/proofserv.exe proofslave
> tigist   16871  0.0  9.8 235148 203928 ?     Ss   11:20   0:04 /opt/brahms/pro/bin/proofserv.exe proofslave
>
> rcas0066:
> USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
> tigist   16629  0.0  9.8 234880 203920 ?     Ss   11:20   0:04 /opt/brahms/pro/bin/proofserv.exe proofslave
> tigist   16634  0.0  9.8 235464 203924 ?     Ss   11:20   0:04 /opt/brahms/pro/bin/proofserv.exe proofslave
>
> rcas0067:
> USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
> tigist    9125  0.0  9.8 233836 203932 ?     Ss   11:20   0:04 /opt/brahms/pro/bin/proofserv.exe proofslave
> tigist    9130  0.0  9.8 234328 203920 ?     Ss   11:20   0:03 /opt/brahms/pro/bin/proofserv.exe proofslave
>
> rcas0068:
> USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
> tigist    2143  0.0  9.8 235068 203928 ?     Ss   11:20   0:04 /opt/brahms/pro/bin/proofserv.exe proofslave
> tigist    2145  0.0  9.8 234620 203928 ?     Ss   11:20   0:04 /opt/brahms/pro/bin/proofserv.exe proofslave
>
> -----Original Message-----
> From: Flemming Videbaek [mailto:videbaek_at_bnl.gov]
> Sent: Tue 9/18/2007 3:07 PM
> To: Bekele, Selemon
> Cc: JH Lee
> Subject: Re: [Brahms-dev-l] analysis meeting
>
> Hi Selemon,
>
> I see you are running on rcas0062 proofserv.exe slave or maybe you are not - in anycase there memory size is 2.3Gb per process.
> \Are you sure the process do not have memory leaks ?
>
> Flemming
>
> --------------------------------------------
> Flemming Videbaek
> Physics Department
> Bldg 510-D
> Brookhaven National Laboratory
> Upton, NY11973
>
> phone: 631-344-4106
> cell:       631-681-1596
> fax:        631-344-1334
> e-mail: videbaek @ bnl gov
> ----- Original Message ----- 
> From: "Bekele, Selemon" <bekeleku_at_ku.edu>
> To: "Flemming Videbaek" <videbaek_at_bnl.gov>; "devlist" <brahms-dev-l_at_lists.bnl.gov>
> Sent: Tuesday, September 18, 2007 1:40 PM
> Subject: RE: [Brahms-dev-l] analysis meeting
>
>
>
> Hi Flemming,
>
>    In order to make sure that I am not doing something
> wrong, I run again over the 57 files individually. This
> time it is a different run, 13345, that had empty histograms.
> I run over the same file locally and with proof and found no
> problem with the file. The error message from /var/log/ROOT.log
> on rcas0055:
> =======================
> Sep 17 22:40:13 rcas0055 proofserv[22320]: tigist:master-0:SysError:<TUnixSystem::DispatchOneEvent>:select: read error on 4  (Bad
> file descriptor)
> Sep 17 22:40:13 rcas0055 last message repeated 3 times
> Sep 17 22:40:18 rcas0055 proofserv[22320]: tigist:master-0:SysError:<TUnixSystem::DispatchOneEvent>:select: read error on 4  (Bad
> file descriptor)
> ======================
>
> Part of the log file in proof (~tigist/ProofOutPut.dat) is shown below.
> it seems that a connection to some machine is reset at the very beginning of
> the session. Could this be a reset of connection to a database in brahms?
>
> =======================
> (Int_t)(1)
> (Int_t)(1)
> (Int_t)(1)
> (Int_t)(1)
> (Int_t)(1)
> (Int_t)(1)
> (Int_t)(1)
> (Int_t)(1)
> (Int_t)(1)
> (Int_t)(1)
> (Int_t)(1)
> (Int_t)(1)
> (Int_t)(1)
> (Int_t)(1)
> (Int_t)(1)
>
> Info in <TProofServ::SetQueryRunning> on master-0: starting query: 1
> Info in <TAdaptivePacketizer::TAdaptivePacketizer> on master-0: fraction of remote files 1.000000
> SysError in <TUnixSystem::UnixSend> on master-0: send (Connection reset by peer)
> SysError in <TUnixSystem::DispatchOneEvent> on master-0: select: read error on 4
> (Bad file descriptor)
>
> =====================
>
> The problem I am facing seems to be quite random.
>
> As for the meeting on the coming friday, I need to resolve this issue
> before since I do not have anything new after the rcf upgrades.
>
> Selemon,
>
> -----Original Message-----
> From: brahms-dev-l-bounces_at_lists.bnl.gov on behalf of Flemming Videbaek
> Sent: Tue 9/18/2007 9:34 AM
> To: devlist
> Subject: [Brahms-dev-l] analysis meeting
>
> We will have an analysis meeting this coming Friday.
> There are planned presentation from Selemon on CuCu and on auau at 62 from Ionut
>
> The agenda page on indico has been setup for this meeting.
>
> Flemming
>
> --------------------------------------------
> Flemming Videbaek
> Physics Department
> Bldg 510-D
> Brookhaven National Laboratory
> Upton, NY11973
>
> phone: 631-344-4106
> cell:       631-681-1596
> fax:        631-344-1334
> e-mail: videbaek @ bnl gov
>
>
>
>
>
>
> 

_______________________________________________
Brahms-dev-l mailing list
Brahms-dev-l_at_lists.bnl.gov
https://lists.bnl.gov/mailman/listinfo/brahms-dev-l
Received on Thu Sep 20 2007 - 14:32:34 EDT

This archive was generated by hypermail 2.2.0 : Thu Sep 20 2007 - 14:35:37 EDT