From: Ian Bearden (bearden@nbi.dk)
Date: Sat Mar 15 2003 - 15:07:29 EST
Hi Claus, I just checked...this time I don't see any of my jobs... But, I do notice that there is a job that has been running since March 4, and this is followed by 9 lines that are blank, except for the exec host (which are: rcas0006,0023,0017,0011,0019). If I do 'top' on rcas0023, I don't see any jobs taking CPU, but I guess that bsub won't start another job as long as two are running, but I don't know if that happens. If the system is 'smart' enough to check CPU usage rather than the number of jobs I guess it won't matter... Is there anyone who can kill these old (and presumably dead) jobs so that Claus can continue? Cheers, Ian On lørdag, mar 15, 2003, at 20:53 Europe/Copenhagen, Claus O. E. Jorgensen wrote: > > Hi, > > I've been trying to redo the global tracking and some of the tof > calibrations, but twice the last weeks the rcas machines have been > swamped by unreasonable jobs (fx jobs hanging in an infinite loop). > > There are two obvious solutions to this problem: Check your jobs > now and then to see if it looks ok or (the easy way) put a max CPU time > limit on your jobs. Just use the c option ("bsub ... -c 120" gives a > max > of two hours). And then of course, don't start 200 jobs that each > takes 10 > hours, at least not without informing the rest of us that you will > occupy > the machines for ... well you can do the calculation yourself. > > Cheers, > > Claus > > +------------------------------------------------------------+ > | Claus E. Jørgensen Phone : (+45) 33 32 49 49 | > | Cand. Scient. (M. Sc.) Cell : (+45) 27 29 49 49 | > | Office : (+45) 35 32 54 04 | > | Niels Bohr Institute, Ta-2, Fax : (+45) 35 32 50 16 | > | Blegdamsvej 17, DK-2100, E-mail : ekman@nbi.dk | > | University of Copenhagen Home : www.nbi.dk/~ekman/ | > +------------------------------------------------------------+ >
This archive was generated by hypermail 2.1.5 : Sat Mar 15 2003 - 15:08:46 EST