From: Ian Bearden (bearden@nbi.dk)
Date: Sat Mar 15 2003 - 15:07:29 EST
Hi Claus,
I just checked...this time I don't see any of my jobs...
But, I do notice that there is a job that has been running since March
4, and this is followed by 9 lines that are blank, except for the exec
host (which are: rcas0006,0023,0017,0011,0019). If I do 'top' on
rcas0023, I don't see any jobs taking CPU, but
I guess that bsub won't start another job as long as two are running,
but I don't know if that happens. If the system is 'smart' enough to
check CPU usage rather than the number of jobs I guess it won't
matter...
Is there anyone who can kill these old (and presumably dead) jobs so
that Claus can continue?
Cheers,
Ian
On lørdag, mar 15, 2003, at 20:53 Europe/Copenhagen, Claus O. E.
Jorgensen wrote:
>
> Hi,
>
> I've been trying to redo the global tracking and some of the tof
> calibrations, but twice the last weeks the rcas machines have been
> swamped by unreasonable jobs (fx jobs hanging in an infinite loop).
>
> There are two obvious solutions to this problem: Check your jobs
> now and then to see if it looks ok or (the easy way) put a max CPU time
> limit on your jobs. Just use the c option ("bsub ... -c 120" gives a
> max
> of two hours). And then of course, don't start 200 jobs that each
> takes 10
> hours, at least not without informing the rest of us that you will
> occupy
> the machines for ... well you can do the calculation yourself.
>
> Cheers,
>
> Claus
>
> +------------------------------------------------------------+
> | Claus E. Jørgensen Phone : (+45) 33 32 49 49 |
> | Cand. Scient. (M. Sc.) Cell : (+45) 27 29 49 49 |
> | Office : (+45) 35 32 54 04 |
> | Niels Bohr Institute, Ta-2, Fax : (+45) 35 32 50 16 |
> | Blegdamsvej 17, DK-2100, E-mail : ekman@nbi.dk |
> | University of Copenhagen Home : www.nbi.dk/~ekman/ |
> +------------------------------------------------------------+
>
This archive was generated by hypermail 2.1.5 : Sat Mar 15 2003 - 15:08:46 EST