Re: [Brahms-dev-l] proof problems

From: Stephen Sanders <ssanders@ku.edu>
Date: Fri Mar 24 2006 - 21:07:54 EST
I can add another data point:  I just inadvertently started two  
sessions on the same
master.  The master is now dead (41).   While I did this by mistake,  
it does suggest
the need to check if someone else is already running a session before  
starting a
new one.   My code will run OK in isolation...
...steve
On Mar 24, 2006, at 6:07 PM, Flemming Videbaek wrote:

> This has been a problem from a while,and as Erik (as well as  
> Steve ) has indicated this seems to happen too when
> (apperently) nothing is done in the sessions. On the other hand,  
> many seession do in fact work. I know that Richard Hogue
> is geeting a bit frustrated re-booting the machines, so it will be  
> important for us to try to find the underlying problem.
> There is indeed some indication this is related to the root-5.10  
> upgrade, but it has to be documented in a reasonale way if we are  
> to apporach the ROOT team. There are NO bug-reports on the root  
> page on proof in this version, so it is quite concieveble that the  
> problem is due to how we do this.
>
> regards
>
> --------------------------------------------
> Flemming Videbaek
> Physics Department
> Bldg 510-D
> Brookhaven National Laboratory
> Upton, NY11973
>
> phone: 631-344-4106
> fax:        631-344-1334
> e-mail: videbaek @ bnl.gov
> ----- Original Message ----- From: "Hironori Ito"  
> <hito@rcf.rhic.bnl.gov>
> To: "Brahms Dev" <brahms-dev-l@lists.bnl.gov>
> Sent: Friday, March 24, 2006 4:46 PM
> Subject: Re: [Brahms-dev-l] proof problems
>
>
>> At first, check the proof/root log,  it is located in /var/log/ 
>> Root.log or Root.log.1 (for older log) in every machine with rootd/ 
>> proofd  For example, I can see
>>
>> Mar 22 13:12:15 rcas0055 proofslave[8161]: ebj:slave  
>> 0.0:Error:<TFile::Init>:file /brahms/data21/data/run04/auau/200/ 
>> r10844/dst/dst010844v2p3.root is truncated
>> at 147895193 bytes: should be 149235656, trying to recover
>>
>> or,
>>
>> tigist:master0:Error:<TPacketizer::ValidateFiles>:cannot get  
>> entries for /brahms/data21//data/run05/cucu/200/r14123/dst/ 
>> dst014123v3p2.root (
>> Mar 22 19:11:50 rcas0055 proofserv[31758]: tigist:master0:***  
>> Break ***:segmentation violation
>>
>> etc..
>>
>>
>> also, clean up your package (or make one) to make sure that you  
>> don't get warning.  Log is full of warning about missing  
>> dictionary of class from dst.  You just need to load them in your  
>> SETUP.C of package.
>>
>>
>> Hiro
>>
>>
>> Johnson, Erik B wrote:
>>
>>> Brahms,
>>>  I have not been invloved in any discussions about proof, but we  
>>> do have a major problem with memory.  Last night I ran a proof  
>>> session which died on me because it used up all the available  
>>> memeory.  Now there was nothing special about this session other  
>>> than I was testing out my code.  Now there could be somehting  
>>> wrong with my code, but this does not explain why a proof session  
>>> uses up more and more memory when I do NOTHING!!!  Now if I want  
>>> to load in a number of libraries, I'm using up more memory at the  
>>> start.
>>> Yesterday I ran over all the auau 200GeV data filling a number of  
>>> histograms.  I created 8622 histograms (a good number of them  
>>> were not filled with any events) in a proof session.  The session  
>>> ran fine.
>>> Last night, I ran another session to test some code and I tried  
>>> to process all of the auau 200GeV data.  I created 11 histograms,  
>>> loaded in a library, and the proof session hung with a memory  
>>> leak.  Here is the responce I got from RCF
>>>
>>> OK - I'll reboot some of the nodes that are still down but Brahms
>>> needs to come to grips with how to run proofserv.  You can't
>>> expect to run multiple proofserv processes >1.7GB each without  
>>> driving swap down to ZERO, even with 2G of memory, possibly  
>>> crashing it.
>>> If this occurs on a weekend, you will have to wait until Monday.
>>>
>>> Open to suggestions if there is anything we can do at this end.
>>>
>>> --Richard Hogue
>>>
>>>
>>> Now does anyone have a good idea on how one can approach and fix  
>>> this problem?
>>> Erik
>>> _______________________________________________
>>> Brahms-dev-l mailing list
>>> Brahms-dev-l@lists.bnl.gov
>>> http://lists.bnl.gov/mailman/listinfo/brahms-dev-l
>>>
>>
>> _______________________________________________
>> Brahms-dev-l mailing list
>> Brahms-dev-l@lists.bnl.gov
>> http://lists.bnl.gov/mailman/listinfo/brahms-dev-l
>
> _______________________________________________
> Brahms-dev-l mailing list
> Brahms-dev-l@lists.bnl.gov
> http://lists.bnl.gov/mailman/listinfo/brahms-dev-l

_______________________________________________
Brahms-dev-l mailing list
Brahms-dev-l@lists.bnl.gov
http://lists.bnl.gov/mailman/listinfo/brahms-dev-l
Received on Fri Mar 24 21:08:42 2006

This archive was generated by hypermail 2.1.8 : Fri Mar 24 2006 - 21:08:55 EST