[Brahms-dev-l] proof

From: Flemming Videbaek <videbaek_at_bnl.gov>
Date: Thu, 20 Sep 2007 12:58:32 -0400
Hiw

I would really like to know how you access the nodes and run i.e. do you have many TDSet * set;  set->Process("selector","",..)
in a sequence. I have seen that such can increase in running processes. I also see that in the session running right now-
only the process on 62 gets any cpu time. Where when do you do the _>SetParameter("PROOF_MaxSlavesPerNode .. ?

It does look peciluar. I do know that not all memory is released at the end of a Process()  from the slaves.

Flemming





--------------------------------------------
Flemming Videbaek
Physics Department 
Bldg 510-D Where/when do you set the 
Brookhaven National Laboratory
Upton, NY11973

phone: 631-344-4106
cell:       631-681-1596
fax:        631-344-1334
e-mail: videbaek @ bnl gov
----- Original Message ----- 
From: "Bekele, Selemon" <bekeleku_at_ku.edu>
To: "Flemming Videbaek" <videbaek_at_bnl.gov>
Cc: "JH Lee" <jhlee_at_bnl.gov>; <brahms-dev-l_at_lists.bnl.gov>
Sent: Thursday, September 20, 2007 12:51 PM
Subject: RE: [Brahms-dev-l] analysis meeting




Hi Flemming,

    I have been monitoring my proof sessions for memory size.
about an hour and 20 minutes into the sessions:

the memory size on the slaves (63 - 68) is about 235 MB. 
the memory size on the slaves on 62 is about 2 GB. 

Is there any reason why the memory size on the 62 slaves should grow
to about 10 times those on the other slave machines? Uneven sharing of
loads between the slaves?

Selemon,

============================

rcas0062:
USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
tigist   26091  1.3  2.7 127924 56976 ?      Ss   11:19   0:56 /opt/brahms/pro/bin/proofserv.exe proofserv
tigist   26577 68.7 40.9 2108652 849448 ?    Rs   11:20  47:53 /opt/brahms/pro/bin/proofserv.exe proofslave
tigist   26578 68.2 40.6 2053872 843488 ?    Ds   11:20  47:30 /opt/brahms/pro/bin/proofserv.exe proofslave

rcas0063:
USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
tigist   14298  0.0  9.8 235092 203924 ?     Ss   11:20   0:04 /opt/brahms/pro/bin/proofserv.exe proofslave
tigist   14300  0.0  9.8 234548 203928 ?     Ss   11:20   0:04 /opt/brahms/pro/bin/proofserv.exe proofslave

rcas0064:
USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
tigist   15736  0.0  9.8 233564 203924 ?     Ss   11:20   0:04 /opt/brahms/pro/bin/proofserv.exe proofslave
tigist   15737  0.0  9.8 235480 203924 ?     Ss   11:20   0:04 /opt/brahms/pro/bin/proofserv.exe proofslave

rcas0065:
USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
tigist   16870  0.0  9.8 234236 203928 ?     Ss   11:20   0:04 /opt/brahms/pro/bin/proofserv.exe proofslave
tigist   16871  0.0  9.8 235148 203928 ?     Ss   11:20   0:04 /opt/brahms/pro/bin/proofserv.exe proofslave

rcas0066:
USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
tigist   16629  0.0  9.8 234880 203920 ?     Ss   11:20   0:04 /opt/brahms/pro/bin/proofserv.exe proofslave
tigist   16634  0.0  9.8 235464 203924 ?     Ss   11:20   0:04 /opt/brahms/pro/bin/proofserv.exe proofslave

rcas0067:
USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
tigist    9125  0.0  9.8 233836 203932 ?     Ss   11:20   0:04 /opt/brahms/pro/bin/proofserv.exe proofslave
tigist    9130  0.0  9.8 234328 203920 ?     Ss   11:20   0:03 /opt/brahms/pro/bin/proofserv.exe proofslave

rcas0068:
USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
tigist    2143  0.0  9.8 235068 203928 ?     Ss   11:20   0:04 /opt/brahms/pro/bin/proofserv.exe proofslave
tigist    2145  0.0  9.8 234620 203928 ?     Ss   11:20   0:04 /opt/brahms/pro/bin/proofserv.exe proofslave

-----Original Message-----
From: Flemming Videbaek [mailto:videbaek_at_bnl.gov]
Sent: Tue 9/18/2007 3:07 PM
To: Bekele, Selemon
Cc: JH Lee
Subject: Re: [Brahms-dev-l] analysis meeting
 
Hi Selemon,

I see you are running on rcas0062 proofserv.exe slave or maybe you are not - in anycase there memory size is 2.3Gb per process.
\Are you sure the process do not have memory leaks ?

Flemming

--------------------------------------------
Flemming Videbaek
Physics Department
Bldg 510-D
Brookhaven National Laboratory
Upton, NY11973

phone: 631-344-4106
cell:       631-681-1596
fax:        631-344-1334
e-mail: videbaek @ bnl gov
----- Original Message ----- 
From: "Bekele, Selemon" <bekeleku_at_ku.edu>
To: "Flemming Videbaek" <videbaek_at_bnl.gov>; "devlist" <brahms-dev-l_at_lists.bnl.gov>
Sent: Tuesday, September 18, 2007 1:40 PM
Subject: RE: [Brahms-dev-l] analysis meeting



Hi Flemming,

    In order to make sure that I am not doing something
wrong, I run again over the 57 files individually. This
time it is a different run, 13345, that had empty histograms.
I run over the same file locally and with proof and found no
 problem with the file. The error message from /var/log/ROOT.log
on rcas0055:
=======================
Sep 17 22:40:13 rcas0055 proofserv[22320]: tigist:master-0:SysError:<TUnixSystem::DispatchOneEvent>:select: read error on 4  (Bad 
file descriptor)
Sep 17 22:40:13 rcas0055 last message repeated 3 times
Sep 17 22:40:18 rcas0055 proofserv[22320]: tigist:master-0:SysError:<TUnixSystem::DispatchOneEvent>:select: read error on 4  (Bad 
file descriptor)
======================

Part of the log file in proof (~tigist/ProofOutPut.dat) is shown below.
it seems that a connection to some machine is reset at the very beginning of
the session. Could this be a reset of connection to a database in brahms?

=======================
(Int_t)(1)
(Int_t)(1)
(Int_t)(1)
(Int_t)(1)
(Int_t)(1)
(Int_t)(1)
(Int_t)(1)
(Int_t)(1)
(Int_t)(1)
(Int_t)(1)
(Int_t)(1)
(Int_t)(1)
(Int_t)(1)
(Int_t)(1)
(Int_t)(1)

Info in <TProofServ::SetQueryRunning> on master-0: starting query: 1
Info in <TAdaptivePacketizer::TAdaptivePacketizer> on master-0: fraction of remote files 1.000000
SysError in <TUnixSystem::UnixSend> on master-0: send (Connection reset by peer)
SysError in <TUnixSystem::DispatchOneEvent> on master-0: select: read error on 4
 (Bad file descriptor)

=====================

The problem I am facing seems to be quite random.

As for the meeting on the coming friday, I need to resolve this issue
before since I do not have anything new after the rcf upgrades.

Selemon,

-----Original Message-----
From: brahms-dev-l-bounces_at_lists.bnl.gov on behalf of Flemming Videbaek
Sent: Tue 9/18/2007 9:34 AM
To: devlist
Subject: [Brahms-dev-l] analysis meeting

We will have an analysis meeting this coming Friday.
There are planned presentation from Selemon on CuCu and on auau at 62 from Ionut

The agenda page on indico has been setup for this meeting.

Flemming

--------------------------------------------
Flemming Videbaek
Physics Department
Bldg 510-D
Brookhaven National Laboratory
Upton, NY11973

phone: 631-344-4106
cell:       631-681-1596
fax:        631-344-1334
e-mail: videbaek @ bnl gov




_______________________________________________
Brahms-dev-l mailing list
Brahms-dev-l_at_lists.bnl.gov
https://lists.bnl.gov/mailman/listinfo/brahms-dev-l
Received on Thu Sep 20 2007 - 12:59:39 EDT

This archive was generated by hypermail 2.2.0 : Thu Sep 20 2007 - 13:00:01 EDT