Re: MainModule - What is a run?

From: Christian Holm Christensen (cholm@hehi03.nbi.dk)
Date: Tue Jun 05 2001 - 14:21:33 EDT

  • Next message: Stephen J. Sanders: "Si/tile energy calibrations"

    Hi Flemming et al, 
    
    On Tue, 5 Jun 2001 12:42:26 -0400
    "Flemming Videbaek" <videbaek@sgs1.hirg.bnl.gov> wrote
    concerning ": MainModule  - What is a run?":
    > Christian,
    > 
    > It seems that you have defined a run to be the reading of one file
    > in MainModule as well, as in the definition of the BrIOModule.  
    
    More precisly a "run-level" as I believe it says in the
    documentation. 
    
    > when either an EOF is reached, or #events reached the limit the
    > control transfers out of the event loop call End() and then
    > subsequently goes on to next file.  
    
    That is correct. 
    
    > This is the correct behaviour if the next file is indeed a new run,
    > but not if it is another sequence of the same run. In that case you
    > really want just to continue reading and just go to End() when
    > everything is read.  
    
    This would require that BrIOModule as well as BrMainModule had another
    mode of operation. 
    
    > One reason here- e.g. for DB access you do not want this per seq
    > file; for writting to DB you would certainly not want it on each
    > sequence. 
    
    Yes, that's clear. 
     
    > One possible way of dealing with this is to add one more mode to the
    > BrIOModule e.g. kBrSeqFile with the implication that File is opened
    > at Begin() on Eof() the next file in list is openened (The specific
    > files should all be set by the AddFile() such that the IOModule
    > should only deal with it's list of files) 
    
    That is indeed what I'd suggest. 
     
    > -- or
    > the logic is built into the MainModule, but since it does not know
    > anything about files it is not so nice. 
    
    I prefer your first alternative. 
    
    I was indeed aware of this problem when I wrote BrMainModule and
    modified BrIOModule standard BrModule methods (Init, Begin, Event,
    End, Finish), but didn't make an effort to try to solve it, because: 
    
    * When one is reading through raw data files, it should happen in some
      parallel environment like CRF.  Here only one sequence is input per
      job, so there's no "multiple input problem" 
    
    * Subsequently one will merge the output (one per sequence) of the
      pass over the raw files, into one file, representing a full run.
    
    * Then, when doing additional analysis, each file will represent one
      run, and Begin/End does indeed correspond to run boundaries. 
    
    Ofcourse there may be situations where any of this may fail to work
    (The reconstructed data files can not fit into one file - i.e., hits
    the 2GB file limit, or one really wants to loop - in one job - over
    the individual sequences), and so it would be nice to have the
    additional functionality.  However, back then I didn't consider it a
    paramount concern. 
    
    Yours, 
    
    
    
    Christian  -----------------------------------------------------------
    Holm Christensen                             Phone:  (+45) 35 35 96 91 
      Sankt Hansgade 23, 1. th.                  Office: (+45) 353  25 305 
      DK-2200 Copenhagen N                       Web:    www.nbi.dk/~cholm    
      Denmark                                    Email:       cholm@nbi.dk
    



    This archive was generated by hypermail 2b29 : Tue Jun 05 2001 - 14:22:55 EDT