I can add another data point: I just inadvertently started two sessions on the same master. The master is now dead (41). While I did this by mistake, it does suggest the need to check if someone else is already running a session before starting a new one. My code will run OK in isolation... ...steve On Mar 24, 2006, at 6:07 PM, Flemming Videbaek wrote: > This has been a problem from a while,and as Erik (as well as > Steve ) has indicated this seems to happen too when > (apperently) nothing is done in the sessions. On the other hand, > many seession do in fact work. I know that Richard Hogue > is geeting a bit frustrated re-booting the machines, so it will be > important for us to try to find the underlying problem. > There is indeed some indication this is related to the root-5.10 > upgrade, but it has to be documented in a reasonale way if we are > to apporach the ROOT team. There are NO bug-reports on the root > page on proof in this version, so it is quite concieveble that the > problem is due to how we do this. > > regards > > -------------------------------------------- > Flemming Videbaek > Physics Department > Bldg 510-D > Brookhaven National Laboratory > Upton, NY11973 > > phone: 631-344-4106 > fax: 631-344-1334 > e-mail: videbaek @ bnl.gov > ----- Original Message ----- From: "Hironori Ito" > <hito@rcf.rhic.bnl.gov> > To: "Brahms Dev" <brahms-dev-l@lists.bnl.gov> > Sent: Friday, March 24, 2006 4:46 PM > Subject: Re: [Brahms-dev-l] proof problems > > >> At first, check the proof/root log, it is located in /var/log/ >> Root.log or Root.log.1 (for older log) in every machine with rootd/ >> proofd For example, I can see >> >> Mar 22 13:12:15 rcas0055 proofslave[8161]: ebj:slave >> 0.0:Error:<TFile::Init>:file /brahms/data21/data/run04/auau/200/ >> r10844/dst/dst010844v2p3.root is truncated >> at 147895193 bytes: should be 149235656, trying to recover >> >> or, >> >> tigist:master0:Error:<TPacketizer::ValidateFiles>:cannot get >> entries for /brahms/data21//data/run05/cucu/200/r14123/dst/ >> dst014123v3p2.root ( >> Mar 22 19:11:50 rcas0055 proofserv[31758]: tigist:master0:*** >> Break ***:segmentation violation >> >> etc.. >> >> >> also, clean up your package (or make one) to make sure that you >> don't get warning. Log is full of warning about missing >> dictionary of class from dst. You just need to load them in your >> SETUP.C of package. >> >> >> Hiro >> >> >> Johnson, Erik B wrote: >> >>> Brahms, >>> I have not been invloved in any discussions about proof, but we >>> do have a major problem with memory. Last night I ran a proof >>> session which died on me because it used up all the available >>> memeory. Now there was nothing special about this session other >>> than I was testing out my code. Now there could be somehting >>> wrong with my code, but this does not explain why a proof session >>> uses up more and more memory when I do NOTHING!!! Now if I want >>> to load in a number of libraries, I'm using up more memory at the >>> start. >>> Yesterday I ran over all the auau 200GeV data filling a number of >>> histograms. I created 8622 histograms (a good number of them >>> were not filled with any events) in a proof session. The session >>> ran fine. >>> Last night, I ran another session to test some code and I tried >>> to process all of the auau 200GeV data. I created 11 histograms, >>> loaded in a library, and the proof session hung with a memory >>> leak. Here is the responce I got from RCF >>> >>> OK - I'll reboot some of the nodes that are still down but Brahms >>> needs to come to grips with how to run proofserv. You can't >>> expect to run multiple proofserv processes >1.7GB each without >>> driving swap down to ZERO, even with 2G of memory, possibly >>> crashing it. >>> If this occurs on a weekend, you will have to wait until Monday. >>> >>> Open to suggestions if there is anything we can do at this end. >>> >>> --Richard Hogue >>> >>> >>> Now does anyone have a good idea on how one can approach and fix >>> this problem? >>> Erik >>> _______________________________________________ >>> Brahms-dev-l mailing list >>> Brahms-dev-l@lists.bnl.gov >>> http://lists.bnl.gov/mailman/listinfo/brahms-dev-l >>> >> >> _______________________________________________ >> Brahms-dev-l mailing list >> Brahms-dev-l@lists.bnl.gov >> http://lists.bnl.gov/mailman/listinfo/brahms-dev-l > > _______________________________________________ > Brahms-dev-l mailing list > Brahms-dev-l@lists.bnl.gov > http://lists.bnl.gov/mailman/listinfo/brahms-dev-l _______________________________________________ Brahms-dev-l mailing list Brahms-dev-l@lists.bnl.gov http://lists.bnl.gov/mailman/listinfo/brahms-dev-lReceived on Fri Mar 24 21:08:42 2006
This archive was generated by hypermail 2.1.8 : Fri Mar 24 2006 - 21:08:55 EST