Re: Proposing BRAT/bratmain addition (PLEASE READ!)

From: Christian Holm Christensen (cholm@hehi03.nbi.dk)
Date: Wed Dec 11 2002 - 17:55:25 EST
Next message: Marco Germinario: "coalescence"
Previous message: Hironori Ito: "automake question"
In reply to: Bjorn H Samset: "Proposing BRAT/bratmain addition (PLEASE READ!)"
Next in thread: Flemming Videbaek: "Re: Proposing BRAT/bratmain addition (PLEASE READ!)"
Reply: Flemming Videbaek: "Re: Proposing BRAT/bratmain addition (PLEASE READ!)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Hi Bjorn et al (in particular Djam, Kris, and Flemming), 

First off, thank you for opening this bag (sack?) of worms - it's an
important issue and needs careful deliberation before implementation.
But, I'm getting ahead of my self here - first some comments to your
proposal. 

Bjorn H Samset <bjornhs@rcf2.rhic.bnl.gov> wrote concerning
  Proposing BRAT/bratmain addition (PLEASE READ!) [Tue, 10 Dec 2002 14:24:28 -0500 (EST)] 
----------------------------------------------------------------------
> 
> Hello dev'ils.
> 
<snip>
...
</snip>
> //_________________________________________________________________
> void BrTileRdoModule::Snapshot(ofstream &f, Option_t* option)
> {
>   // Output info on all settable members to file
>   // Must be overloaded by all modules!
>   TString opt(option);
>   opt.ToLower();
> 
>   if (opt.Contains("e"))
>     f << "Module: " << IsA()->GetName() << " " << GetName() << " " <<
> GetTitle() << endl;
> 
>   BrMultRdoModule::Snapshot(f);
>   f << PrintIntMember("AdcGapLimit",fAdcGapLimit) << endl;
>   f << PrintIntMember("SaturationChannel",fSaturationChannel) << endl;
> 
>   if (opt.Contains("e")) {
>     f << "# CVS info: $Revision: 1.19 $ $Date: 2002/09/19 22:27:04 $
> $Author: sanders $" << endl;
>     f << endl;
>   }
> 
> }
> 
> (PrintIntMember and similar are functions of BrModule...)

Argh! What's so difficult in writing 

  f << setw(8) << fIntegerMember << std::endl;

compared to 

  f << PrintIntmMember(fIntegerMember) << std::endl;

The additional cluttering of the base class isn't worth the gain.  

<snip>
...
</snip>
> 
> A test bratmain script which contains only the BrTileRdoModule then looks
> like this:
> 
> ---------------------------------------------------
> Module: BrEventIO BRAHMS ROOT IO File eventInput
> 
> Module: BrTileRdoModule MultTile MultTile
>          AdcGapLimit                4096
>    SaturationChannel               30000
>      ThresholdFactor            5.000000
>        OutlierMethod                   3
>         SingleAdcLow            0.000000
>        SingleAdcHigh          500.000000
>        SingleAdcComp           10.000000
>          AdcGapLimit                4096
>    SaturationChannel               30000
> # CVS info: $Revision: 1.19 $ $Date: 2002/09/19 22:27:04 $ $Author:
> sanders $

This format isn't particularly parser-friendly.  Instead, we should
mimic the `TObject::SavePrimitive' member function - that is save
proper C++ statements: 

  //__________________________________________________________________
  // Module: BrTileRdoModule
  BrTileRdoModule* tileRdoModule = 
    new BrTileRdoModule("MultTile", "TMA reducer");
  tileRdoModule->SetAdcGapLimit(4096);
  tileRdoModule->SetSaturationChannel(30000);
  tileRdoModule->SetThresholdFactor(5);
  tileRdoModule->SetOutlierMethod(BrMultRdoModule::kNoCorrection);
  tileRdoModule->SetSingleAdcLow(0);
  tileRdoModule->SetSingleAdcHigh(500);
  tileRdoModule->SetSingleAdcComp(10);
  mainModule->AddModule(tileRdoModule);
  ... 

In this way, the log file is directly `executable' via `bratmain'. 

> 
> Module: BrEventIO BRAHMS ROOT IO File ouputModule
> --------------------------------------------------
> 
> I'll get rid of the BrEventIOs in the next implementation - they shouldn't
> be there I think (see below for why).
> 
> Also note that two of the parameters occur twice - that's because
> BrTileRdoModule has two setters that are identical to its parent class
> BrMultRdoModule. This framework also incourages cleaner coding ;-)

The particular code you refer to _is_ clean.  If you check the usage
of the particular data members, you'll see the point of the overloaded
functions: provide specific defaults for TMA/SMA.  The problem is
rather in the code you sent - the Br{Tile,Si}RdoModule shouldn't emit
the setting of `fSaturationChannel' etc., as it is BrMultRdoModule
that owns those parameters, not Br{Tile,Si}RdoModule.   I guess you're
confusing data members and member functions, and which one defines
`ownership'. 

In the future, please check your code before pinning `unclean coding'
on others code, thank you. 

> The point of this excercise is two-fold:
> * First of all we get proper documentation of all relevant
>   parameters for a given data reconstruction. This has been missing
>   for a while and caused problems for both me and several others I
>   have talked to. 

Here's my main hick-up with this approach.  

What is the need of this log file?  What's the difference between the
log file and the configuration file?  I'm afraid I don't see any.  

Having the script is just as good (in fact better) than having the log
file.  After all, the script does the job.   This really points to
the problems of your proposal.  Please consider the following: 

 * When do you want the information? 

   - Do you want to take an output file, and figure out how this file
     was created?   

     In that case, the values of various parameters should be stored
     in the file itself, as it is really part of the data, just as
     cuts are really part of the data analysis results. 

   - Do you want to redo the file that you once had but is now gone,
     or make a similar pass on some other input data? 

     In that case, all you need is the configuration script and the
     command line options to `bratmain'. 

   - Do you want this information to be used for collaboration-wide
     production passes or for single user jobs? 

     In the former case, you can make a few assumptions that really
     simplifies the process - for example, you can assume that the job
     was executed at RCF in either CRS or CAS GNU/Linux farms.  You
     can also assume that the official database is available, and so
     on. 

     In the latter case, you can basically just pipe the output from
     the `bratmain' run into a file (with the appropriate verbosity
     level), and make your own personal filling system.  That output
     (with verbosity >= 3) is all you really need. 

* What kind of information do you need? 

  You really only need to know what configuration script was used,
  it's version number (remember, BrMainModule has fields for setting
  various version information, as well as author information, and
  of course as short description) or CVS revision number, the command
  line options passed to `bratmain', the ROOT and BRAT version. 

  Given these pieces of information, you can easily redo a pass:

    cd  ~/src

    export CVSROOT=:pserver:cvs@root.cern.ch:/user/cvs 
    cvs login 
    Password: cvs 
    cvs checkout -d root-<version> root -r <version-tag>
    cd root-<version> 
    ./configure <arch> --prefix=${HOME} 
    make 
    make install

    cd ..
    export PATH=${HOME}/bin:${PATH}

    export CVSROOT=/afs/brhic/brahms/BRAHMS_CVS 
    cvs checkout -d brat-<version> brat -r <version-tag> 
    cd brat-<version> 
    ./autogen --prefix=${HOME}
    make 
    make install 

    cd .. 

    cvs checkout -d brat-<version> brat/scripts/<configuration-script> \
       -r <configuration-script-revision-or-tag>
    cp brat-<version>/scripts/<configuration-script> ~/scripts
  
    cd /scr/<user-name>
    bratmain scripts/<configuration-script> \
      <command-line-options>
    
  This of course requires a certain amount of discipline when changing
  the official scripts in BRAT (and they _must_ be in BRAT), namely: 

    - If a parameter is changed on a module, the version (set in
      BrMainModule) must be changed, and the script must be committed  
      to CVS immediately.  Optionally, a tag can be made. 

    - If a parameter is changed from hard-coded to
      set-by-command-line-option, the version (set in BrMainModule)
      must be changed, and the script must be committed to 
      CVS immediately.  Optionally, a tag can be made.

    - When adding a module, changing the order of the modules,
      removing a module, or similar, the version (set in BrMainModule)
      must be changed, and the script must be committed to CVS
      immediately.  Optionally, a tag can be made.

    - ... probably a few other points too. 

  For calibration constants, and the like, we should record in the
  output file: 
  
    - the name of the detector,
    - the name of the parameter, 
    - the revision number of the parameter, 
    - ... other stuff. 

  Again, this should really be part of the output file rather than a
  log or job summary, as it is an intricate part of the data. 

* Where you do you want to store the information? 

  - If you want to make plain text ASCII files, how should you manage
    these files?   

    To be quite honest, I think these plain text ASCII files are a
    pain in the behind that no-one in their right mind would like to
    manage.  Also, you need to make special parsers to (re)make the
    configuration scripts and command line options.  

    It is just so much simpler to make sure your configuration script
    is around, and make sure that you record the history of that
    script - i.e., put it in a CVS repository.  This has so many
    advantages over the proposed job summaries, not to mention that we
    don't need to write any code, that I don't see the point of the
    proposal. 

  - And what should be stored where? 

    The right place to store information on what modules was in the
    pipeline in a given configuration script - is in the configuration
    script.  Period. 

    The right place to store hard-coded parameters of the modules is
    in the setters of the modules in the configuration script.  

    Optionally, one can output some information on the parameters of
    the modules used to create a file in the file itself. After all,
    the parameter information of the modules is as much a part data,
    as the produced data. 

    Perhaps we should consider making BrModule and it's descendants
    persistent, so that one can simply write the module pipeline
    directly to the file at the end of the job.  Then all the
    information worth knowing on the modules is right there.  Of
    course when then need to go through every module and change the
    class version to some thing greater than 1, and making sure that
    caches (arrays used for intermediate storeage, file pointers,
    histograms, <shudder>ntuples</shudder>, <shudder>trees</shudder>,
    etc. and ladida) are not written to disk.  Again, this is simple,
    efficient, and we don't need to write too much code (almost
    nothing in fact). 

    Now, what information are we lacking?  Well, we need to tie the
    following pieces together: 

      - the configuration file, 
      - the ROOT version,  
      - the BRAT version,  
      - the command line options, excluding trivial options, like
        verbosity, debug level, and the like, and options not changed
        from the default, and input/output files. 
      - input files 
      - output files. 

   If we draw a UML diagram of what composes a job-summary (or pass),
   we arrive at 


                                             +-------------------+
                       +------------<--------| Input/Output file |
                     1 | creator             +-------------------+
   +--------------------------+       inputs | name: string      |
   | Pass                     |----->--------| type: int         |
   +--------------------------+            * +-------------------+
   | when:  date              |                     outputs | *
   | where: string            |                             |
   | who:   string            |----->-----------------------+
   | what:  string            |-------+ 
   +--------------------------+       |
          |                           |          
          v                           v
        1 | configuration     options | *
   +---------------------+  +----------------------+  
   | Configuration file  |  | Command Line Option  |  
   +---------------------+  +----------------------+  
   | + name:     string  |  | + short_name: string |  
   | + version:  int     |  | + long_name: string  |  
   | + revision: int     |  +----------------------+  
   | + tag:      string  |           |
   +---------------------+           v
                                     |
                                   * | values  
                          +---------------------------+
                          | Command Line Option Value |
                          +---------------------------+
                          | value: string             |
                          +---------------------------+
     
  The thing to notice here, is that we have some relations back an
  forth (`creator' of File, and `inputs' and `outputs' of Pass), as
  well as some one-to-many relations (`inputs', `outputs', and
  `options' of Pass), as well as some one-to-one relations.  We'd
  really like to capture this in our job summaries - all this leads to
  a DB as the obvious choice of storing the information.  

  You probably think that `Ah, here he goes again, talking about ``The
  Right Way'', and that we need a DB, and ladida, but he forgets that
  someone needs to write the code, setup the DB, and test it, and
  ... - and in all, not so easy after all - lets do a Quick'n'Dirty
  hack [ed: Also known as ``Quirky'n'Defunct''] and be done with it'. 

  Well, there's good news for you:  The code is already written, the
  database is setup, and it little only a little testing! 

  The code is in `BrPassInfoManager' by Kris, and the database I
  believe is called `BrahmsPass' on rcas0005.  Ok, so there's a few
  things that isn't done altogether yet, but it isn't much, and it
  _is_ The Right Way(tm) :-) 


Ok, the point of all this, is that you should really sit down and draw
up a list of usage patterns, requirements and restrictions, and
ladida, and then figure out what goes where, and why, and how, and
when, and so on.  Then you'll see that you're really off on the wrong
track (sorry, but you are) - and also, it immediately becomes clearer 
what the design should be. 
     
<snip>
...
</snip>
> * Secondly, we get an easy way of recreating old bratmain scrips. 

Why recreate them if you can get them from CVS? 

<snip>
...
</snip>

> What I'm offering to do is make the main framework for this and also
> add Snapshot() to the most frequently used modules, filters,
> packages etc. I can't promise I'll do them all as there are 101
> modules in BRAT (and then filters, packages,... come in addition),
> but I'll try to do most of it. 

I'll bet that the work you need to do on `BrPassInfoManager' and
ladida is about the same or a tad more, than the porposed work, but
the result is 1e6+ times better :-) 

> (And I have a hand-in exam in "Philosophy of Science" next week, but
> hey...) 

Good luck. (I took that ones too - I have a BA in Philosophy :-).  

Yours, 

 ___  |  Christian Holm Christensen 
  |_| |	 -------------------------------------------------------------
    | |	 Address: Sankt Hansgade 23, 1. th.  Phone:  (+45) 35 35 96 91
     _|	          DK-2200 Copenhagen N       Cell:   (+45) 24 61 85 91
    _|	          Denmark                    Office: (+45) 353  25 305
 ____|	 Email:   cholm@nbi.dk               Web:    www.nbi.dk/~cholm
 | |
Next message: Marco Germinario: "coalescence"
Previous message: Hironori Ito: "automake question"
In reply to: Bjorn H Samset: "Proposing BRAT/bratmain addition (PLEASE READ!)"
Next in thread: Flemming Videbaek: "Re: Proposing BRAT/bratmain addition (PLEASE READ!)"
Reply: Flemming Videbaek: "Re: Proposing BRAT/bratmain addition (PLEASE READ!)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
This archive was generated by hypermail 2.1.5 : Wed Dec 11 2002 - 18:15:02 EST