Re: [Brahms-dev-l] Brat update

From: Ian Bearden <bearden@nbi.dk>
Date: Tue Apr 20 2004 - 03:44:08 EDT
  I suppose my bandwidth was exhausted when I submitted all these jobs :)
I did not even think about the fact that all the data were on the same 
disk (though I probably should have) when I
submitted.
I agree that we should do something, but I am not sure I like the 
proposals below, where I comment a bit more.
Sorry I clogged the system so thoroughly yesterday.  (or, to say it 
more positively:  luckily, I was able to expose a
serious design flaw in our analysis system...)
On 20/4-2004, at 02:33, flemming videbaek wrote:

> Some further progress has been made toward resolving the memory leak 
> problem for the DB stuff
> There is though still some issue leftm, but at a much lower level than 
> before. Since the browsers works
> I assume standard code is also ok, But please notify me for problems.
>  
> Flemming
>  
> PS On an unrelate issue, we have to find a way to be able to load the 
> crs-node (and cas) in such way that the normal access is not utterly 
> disrupted. This afternoon (EST) a large set of gtr-runs were executing 
> on the crs node. SInce the datadisk  data 08 that these jobs used for 
> both reading and writting access to the Brahms user disk was extremely 
> slow- this is of course due to the fact the gtr jobs are i/o limitedd 
> and loading ~2*35 jobs in parellel just exhaust the bandwidth.
In hindsight, this is obvious.
> and the data08 and =brahms/u happens to be on the same server machine.
This I did not know, should I have?
> One solution will be to allow only one job per user in the crs-queues 
> if this is possible.
I guess I do not understand this.  If the definition of "job" is, say, 
global tracking on one sequence, the user will have to spend a serious 
amount of time in submitting jobs.  Is there not some way of limiting 
the number of jobs performing IO to the same disk?  Now that I think 
about it, I guess it is not so clever to have the ltr, gtr, and dst on 
the same disk.  For example, in my
BrBooBoo yesterday, if the ltr and gtr had been on different disks, the 
problem would only have been half as bad (provided that the disks were 
on different servers?)
> This will still give a consideraable
> thorughput w/o effectlively wasting people time when the user isks are 
> accessed. Another solution is to moved the user disk +www to a server 
> by itself.
I think that the user disk should probably be on a separate server, in 
addition to whatever other solutions we find.
> I find it unreasonable that a normal compile job in a brat subsections 
> takes 10 minutes vs ~ 1 minute under normal cirtumstances.
Yes, it is unreasonable.
How can I submit  a large number jobs to crs without causing this 
problem?  Should one limit the number of processes to N by only 
including N rcas nodes in the processor list?
-Ian
>  
>  
> ----------------------------------------------------------------
> Flemming Videbaek
> Physics Department
> Brookhaven National Laboratory
>  
> e-mail: videbaek@bnl.gov
> phone: 631-344-4106
> _______________________________________________
> Brahms-dev-l mailing list
> Brahms-dev-l@lists.bnl.gov
> http://lists.bnl.gov/mailman/listinfo/brahms-dev-l




_______________________________________________
Brahms-dev-l mailing list
Brahms-dev-l@lists.bnl.gov
http://lists.bnl.gov/mailman/listinfo/brahms-dev-l
Received on Tue Apr 20 03:44:22 2004

This archive was generated by hypermail 2.1.8 : Tue Apr 20 2004 - 03:44:42 EDT