Re: Proposing BRAT/bratmain addition (PLEASE READ!)

From: Christian Holm Christensen (cholm@hehi03.nbi.dk)
Date: Wed Dec 11 2002 - 17:55:25 EST

  • Next message: Marco Germinario: "coalescence"
    Hi Bjorn et al (in particular Djam, Kris, and Flemming), 
    
    First off, thank you for opening this bag (sack?) of worms - it's an
    important issue and needs careful deliberation before implementation.
    But, I'm getting ahead of my self here - first some comments to your
    proposal. 
    
    Bjorn H Samset <bjornhs@rcf2.rhic.bnl.gov> wrote concerning
      Proposing BRAT/bratmain addition (PLEASE READ!) [Tue, 10 Dec 2002 14:24:28 -0500 (EST)] 
    ----------------------------------------------------------------------
    > 
    > Hello dev'ils.
    > 
    <snip>
    ...
    </snip>
    > //_________________________________________________________________
    > void BrTileRdoModule::Snapshot(ofstream &f, Option_t* option)
    > {
    >   // Output info on all settable members to file
    >   // Must be overloaded by all modules!
    >   TString opt(option);
    >   opt.ToLower();
    > 
    >   if (opt.Contains("e"))
    >     f << "Module: " << IsA()->GetName() << " " << GetName() << " " <<
    > GetTitle() << endl;
    > 
    >   BrMultRdoModule::Snapshot(f);
    >   f << PrintIntMember("AdcGapLimit",fAdcGapLimit) << endl;
    >   f << PrintIntMember("SaturationChannel",fSaturationChannel) << endl;
    > 
    >   if (opt.Contains("e")) {
    >     f << "# CVS info: $Revision: 1.19 $ $Date: 2002/09/19 22:27:04 $
    > $Author: sanders $" << endl;
    >     f << endl;
    >   }
    > 
    > }
    > 
    > (PrintIntMember and similar are functions of BrModule...)
    
    Argh! What's so difficult in writing 
    
      f << setw(8) << fIntegerMember << std::endl;
    
    compared to 
    
      f << PrintIntmMember(fIntegerMember) << std::endl;
    
    The additional cluttering of the base class isn't worth the gain.  
    
    <snip>
    ...
    </snip>
    > 
    > A test bratmain script which contains only the BrTileRdoModule then looks
    > like this:
    > 
    > ---------------------------------------------------
    > Module: BrEventIO BRAHMS ROOT IO File eventInput
    > 
    > Module: BrTileRdoModule MultTile MultTile
    >          AdcGapLimit                4096
    >    SaturationChannel               30000
    >      ThresholdFactor            5.000000
    >        OutlierMethod                   3
    >         SingleAdcLow            0.000000
    >        SingleAdcHigh          500.000000
    >        SingleAdcComp           10.000000
    >          AdcGapLimit                4096
    >    SaturationChannel               30000
    > # CVS info: $Revision: 1.19 $ $Date: 2002/09/19 22:27:04 $ $Author:
    > sanders $
    
    This format isn't particularly parser-friendly.  Instead, we should
    mimic the `TObject::SavePrimitive' member function - that is save
    proper C++ statements: 
    
      //__________________________________________________________________
      // Module: BrTileRdoModule
      BrTileRdoModule* tileRdoModule = 
        new BrTileRdoModule("MultTile", "TMA reducer");
      tileRdoModule->SetAdcGapLimit(4096);
      tileRdoModule->SetSaturationChannel(30000);
      tileRdoModule->SetThresholdFactor(5);
      tileRdoModule->SetOutlierMethod(BrMultRdoModule::kNoCorrection);
      tileRdoModule->SetSingleAdcLow(0);
      tileRdoModule->SetSingleAdcHigh(500);
      tileRdoModule->SetSingleAdcComp(10);
      mainModule->AddModule(tileRdoModule);
      ... 
    
    In this way, the log file is directly `executable' via `bratmain'. 
    
    > 
    > Module: BrEventIO BRAHMS ROOT IO File ouputModule
    > --------------------------------------------------
    > 
    > I'll get rid of the BrEventIOs in the next implementation - they shouldn't
    > be there I think (see below for why).
    > 
    > Also note that two of the parameters occur twice - that's because
    > BrTileRdoModule has two setters that are identical to its parent class
    > BrMultRdoModule. This framework also incourages cleaner coding ;-)
    
    The particular code you refer to _is_ clean.  If you check the usage
    of the particular data members, you'll see the point of the overloaded
    functions: provide specific defaults for TMA/SMA.  The problem is
    rather in the code you sent - the Br{Tile,Si}RdoModule shouldn't emit
    the setting of `fSaturationChannel' etc., as it is BrMultRdoModule
    that owns those parameters, not Br{Tile,Si}RdoModule.   I guess you're
    confusing data members and member functions, and which one defines
    `ownership'. 
    
    In the future, please check your code before pinning `unclean coding'
    on others code, thank you. 
    
    > The point of this excercise is two-fold:
    > * First of all we get proper documentation of all relevant
    >   parameters for a given data reconstruction. This has been missing
    >   for a while and caused problems for both me and several others I
    >   have talked to. 
    
    Here's my main hick-up with this approach.  
    
    What is the need of this log file?  What's the difference between the
    log file and the configuration file?  I'm afraid I don't see any.  
    
    Having the script is just as good (in fact better) than having the log
    file.  After all, the script does the job.   This really points to
    the problems of your proposal.  Please consider the following: 
    
     * When do you want the information? 
    
       - Do you want to take an output file, and figure out how this file
         was created?   
    
         In that case, the values of various parameters should be stored
         in the file itself, as it is really part of the data, just as
         cuts are really part of the data analysis results. 
    
       - Do you want to redo the file that you once had but is now gone,
         or make a similar pass on some other input data? 
    
         In that case, all you need is the configuration script and the
         command line options to `bratmain'. 
    
       - Do you want this information to be used for collaboration-wide
         production passes or for single user jobs? 
    
         In the former case, you can make a few assumptions that really
         simplifies the process - for example, you can assume that the job
         was executed at RCF in either CRS or CAS GNU/Linux farms.  You
         can also assume that the official database is available, and so
         on. 
    
         In the latter case, you can basically just pipe the output from
         the `bratmain' run into a file (with the appropriate verbosity
         level), and make your own personal filling system.  That output
         (with verbosity >= 3) is all you really need. 
    
    * What kind of information do you need? 
    
      You really only need to know what configuration script was used,
      it's version number (remember, BrMainModule has fields for setting
      various version information, as well as author information, and
      of course as short description) or CVS revision number, the command
      line options passed to `bratmain', the ROOT and BRAT version. 
    
      Given these pieces of information, you can easily redo a pass:
    
        cd  ~/src
    
        export CVSROOT=:pserver:cvs@root.cern.ch:/user/cvs 
        cvs login 
        Password: cvs 
        cvs checkout -d root-<version> root -r <version-tag>
        cd root-<version> 
        ./configure <arch> --prefix=${HOME} 
        make 
        make install
    
        cd ..
        export PATH=${HOME}/bin:${PATH}
    
        export CVSROOT=/afs/brhic/brahms/BRAHMS_CVS 
        cvs checkout -d brat-<version> brat -r <version-tag> 
        cd brat-<version> 
        ./autogen --prefix=${HOME}
        make 
        make install 
    
        cd .. 
    
        cvs checkout -d brat-<version> brat/scripts/<configuration-script> \
           -r <configuration-script-revision-or-tag>
        cp brat-<version>/scripts/<configuration-script> ~/scripts
      
        cd /scr/<user-name>
        bratmain scripts/<configuration-script> \
          <command-line-options>
        
      This of course requires a certain amount of discipline when changing
      the official scripts in BRAT (and they _must_ be in BRAT), namely: 
    
        - If a parameter is changed on a module, the version (set in
          BrMainModule) must be changed, and the script must be committed  
          to CVS immediately.  Optionally, a tag can be made. 
    
        - If a parameter is changed from hard-coded to
          set-by-command-line-option, the version (set in BrMainModule)
          must be changed, and the script must be committed to 
          CVS immediately.  Optionally, a tag can be made.
    
        - When adding a module, changing the order of the modules,
          removing a module, or similar, the version (set in BrMainModule)
          must be changed, and the script must be committed to CVS
          immediately.  Optionally, a tag can be made.
    
        - ... probably a few other points too. 
    
      For calibration constants, and the like, we should record in the
      output file: 
      
        - the name of the detector,
        - the name of the parameter, 
        - the revision number of the parameter, 
        - ... other stuff. 
    
      Again, this should really be part of the output file rather than a
      log or job summary, as it is an intricate part of the data. 
    
    * Where you do you want to store the information? 
    
      - If you want to make plain text ASCII files, how should you manage
        these files?   
    
        To be quite honest, I think these plain text ASCII files are a
        pain in the behind that no-one in their right mind would like to
        manage.  Also, you need to make special parsers to (re)make the
        configuration scripts and command line options.  
    
        It is just so much simpler to make sure your configuration script
        is around, and make sure that you record the history of that
        script - i.e., put it in a CVS repository.  This has so many
        advantages over the proposed job summaries, not to mention that we
        don't need to write any code, that I don't see the point of the
        proposal. 
    
      - And what should be stored where? 
    
        The right place to store information on what modules was in the
        pipeline in a given configuration script - is in the configuration
        script.  Period. 
    
        The right place to store hard-coded parameters of the modules is
        in the setters of the modules in the configuration script.  
    
        Optionally, one can output some information on the parameters of
        the modules used to create a file in the file itself. After all,
        the parameter information of the modules is as much a part data,
        as the produced data. 
    
        Perhaps we should consider making BrModule and it's descendants
        persistent, so that one can simply write the module pipeline
        directly to the file at the end of the job.  Then all the
        information worth knowing on the modules is right there.  Of
        course when then need to go through every module and change the
        class version to some thing greater than 1, and making sure that
        caches (arrays used for intermediate storeage, file pointers,
        histograms, <shudder>ntuples</shudder>, <shudder>trees</shudder>,
        etc. and ladida) are not written to disk.  Again, this is simple,
        efficient, and we don't need to write too much code (almost
        nothing in fact). 
    
        Now, what information are we lacking?  Well, we need to tie the
        following pieces together: 
    
          - the configuration file, 
          - the ROOT version,  
          - the BRAT version,  
          - the command line options, excluding trivial options, like
            verbosity, debug level, and the like, and options not changed
            from the default, and input/output files. 
          - input files 
          - output files. 
    
       If we draw a UML diagram of what composes a job-summary (or pass),
       we arrive at 
    
    
                                                 +-------------------+
                           +------------<--------| Input/Output file |
                         1 | creator             +-------------------+
       +--------------------------+       inputs | name: string      |
       | Pass                     |----->--------| type: int         |
       +--------------------------+            * +-------------------+
       | when:  date              |                     outputs | *
       | where: string            |                             |
       | who:   string            |----->-----------------------+
       | what:  string            |-------+ 
       +--------------------------+       |
              |                           |          
              v                           v
            1 | configuration     options | *
       +---------------------+  +----------------------+  
       | Configuration file  |  | Command Line Option  |  
       +---------------------+  +----------------------+  
       | + name:     string  |  | + short_name: string |  
       | + version:  int     |  | + long_name: string  |  
       | + revision: int     |  +----------------------+  
       | + tag:      string  |           |
       +---------------------+           v
                                         |
                                       * | values  
                              +---------------------------+
                              | Command Line Option Value |
                              +---------------------------+
                              | value: string             |
                              +---------------------------+
         
      The thing to notice here, is that we have some relations back an
      forth (`creator' of File, and `inputs' and `outputs' of Pass), as
      well as some one-to-many relations (`inputs', `outputs', and
      `options' of Pass), as well as some one-to-one relations.  We'd
      really like to capture this in our job summaries - all this leads to
      a DB as the obvious choice of storing the information.  
    
      You probably think that `Ah, here he goes again, talking about ``The
      Right Way'', and that we need a DB, and ladida, but he forgets that
      someone needs to write the code, setup the DB, and test it, and
      ... - and in all, not so easy after all - lets do a Quick'n'Dirty
      hack [ed: Also known as ``Quirky'n'Defunct''] and be done with it'. 
    
      Well, there's good news for you:  The code is already written, the
      database is setup, and it little only a little testing! 
    
      The code is in `BrPassInfoManager' by Kris, and the database I
      believe is called `BrahmsPass' on rcas0005.  Ok, so there's a few
      things that isn't done altogether yet, but it isn't much, and it
      _is_ The Right Way(tm) :-) 
    
    
    Ok, the point of all this, is that you should really sit down and draw
    up a list of usage patterns, requirements and restrictions, and
    ladida, and then figure out what goes where, and why, and how, and
    when, and so on.  Then you'll see that you're really off on the wrong
    track (sorry, but you are) - and also, it immediately becomes clearer 
    what the design should be. 
         
    <snip>
    ...
    </snip>
    > * Secondly, we get an easy way of recreating old bratmain scrips. 
    
    Why recreate them if you can get them from CVS? 
    
    <snip>
    ...
    </snip>
    
    > What I'm offering to do is make the main framework for this and also
    > add Snapshot() to the most frequently used modules, filters,
    > packages etc. I can't promise I'll do them all as there are 101
    > modules in BRAT (and then filters, packages,... come in addition),
    > but I'll try to do most of it. 
    
    I'll bet that the work you need to do on `BrPassInfoManager' and
    ladida is about the same or a tad more, than the porposed work, but
    the result is 1e6+ times better :-) 
    
    > (And I have a hand-in exam in "Philosophy of Science" next week, but
    > hey...) 
    
    Good luck. (I took that ones too - I have a BA in Philosophy :-).  
    
    Yours, 
    
     ___  |  Christian Holm Christensen 
      |_| |	 -------------------------------------------------------------
        | |	 Address: Sankt Hansgade 23, 1. th.  Phone:  (+45) 35 35 96 91
         _|	          DK-2200 Copenhagen N       Cell:   (+45) 24 61 85 91
        _|	          Denmark                    Office: (+45) 353  25 305
     ____|	 Email:   cholm@nbi.dk               Web:    www.nbi.dk/~cholm
     | |
    


    This archive was generated by hypermail 2.1.5 : Wed Dec 11 2002 - 18:15:02 EST