DST framework (was Re: follow up of pid...)

From: Christian Holm Christensen (cholm@hehi03.nbi.dk)
Date: Wed Nov 20 2002 - 10:00:43 EST

Next message: murray@comp.tamu.edu: "Re: follow up of pid..."

Previous message: Christian Holm Christensen: "Re: Bdst analyse"
In reply to: Bjorn H Samset: "Re: follow up of pid..."
Next in thread: murray@comp.tamu.edu: "Re: follow up of pid..."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Hi, 

Some comments on design of `DST' analysis framework.  These comments
are only dealing with software issues.  

The most important issue in this discussion is the design of the data
structures.  I cannot stress that point enough.  If we have good data
structures half the job is done.  Hence, I suggest you start thinking
about data structures, write-up a specification (preferably in UML)
and post it on the web for general discussion. Do _not_ start coding
until a design has been agreed upon. 

A few points I'd like to raise concerning the data structures: 

* Each data structure must have a singular purpose.   That is, a data
  structure should not be used for many kinds of storage of data.  

* Each data structure must be optimised for storage.  That is, no
  redundant data members of the classes, and the smallest possible
  data type must be used.  For example, if one has a number that will
  only take values in the range -32767 to 32767 (PID for example), a
  simple `Short_t' will do - an `Int_t' is overkill (not to mention a
  `Float_t').  

* Data structures _must_ be of fixed size.  That is, no memory
  allocation must be done in the data class (that means no pointers!).
  This is so that `TTree' may be fully exploited. 

* If a data structure needs to refer to other data structures, they
  should do so via either a `TRef' of `TRefArray' data member. 

* Use `TTree' and not `TNtuple'.  One can only store `Float_t' or
  `Double_t' in a `TNutple' and that's just not flexible enough,
  especially if we need to use the same data structures for Au+Au,
  Au+d, and p+p. 

* One must be vary of virtual member functions in the data
  structures.   Virtual function calls are expensive, and derived
  classes should be kept to an absolute minimum. 

* `Applied Cuts' information is as much part of the resulting data set
  as the physics data, and should be treated on an equal footing to
  that.   That means that `cut information' should be written to the
  output file as data structures rather than to a separate ASCII
  file.  That is easily done, utilising collections and customised
  data structures. 

All these considerations should also help speed up the analysis jobs. 

A few comments on the analysis code: 

* Each class (module, task, whatever) may only do _one_ thing.  

* The framework must allow for a high degree of customisation (a la
  `bratmain' and configuration scripts). 

* If multiple loops is needed, then the best way to add information is
  to use friends of trees (`TTree::AddFriend').  In that way, one can
  keep all the information without copying, and it's flexible enough
  to facilitate redoing a seperate step.  

* Each step should be done as a separate job, resulting in new output
  files.   Multiple loops over the data in the same job is a waste of
  time. 

* The jobs should cut away as much information as possible as soon as
  possible. 

I do _not_ recommend using the BRAT data structures and modules as the
basis for a DST framework.  The problem is, that BRAT is far too slow
due to far to many allocations and deallocations in the code, and that
it's not really geared for `TTree's.   Instead, I would recommend a
schema along the lines of this [1] package. 

Djam had some hick-ups on the DB stuff - in particular he noted that
the comment field of a calibration revision _must_ contain an
informative string, or it will be near impossible to figure out what
happened in the calibration.  I cannot but agree more whole-heartedly,
and if anyone does not add informative comments in the revisions, they
should be rolled in tar and feather, put on a railroad track and
carried into the Atlantic ocean.  Secondly, I'd like to point out that
SQL is quite a simple language, but you don't really need to know a
lot about it.  Instead, use our specialised tool `brdbbrowser'
available in my CVS area [2]. There are also quite a lot of
general-purpose mysgl browsers available out there. 

 ___  |  Christian Holm Christensen 
  |_| |	 -------------------------------------------------------------
    | |	 Address: Sankt Hansgade 23, 1. th.  Phone:  (+45) 35 35 96 91
     _|	          DK-2200 Copenhagen N       Cell:   (+45) 24 61 85 91
    _|	          Denmark                    Office: (+45) 353  25 305
 ____|	 Email:   cholm@nbi.dk               Web:    www.nbi.dk/~cholm
 | |

[1] http://cholm.home.cern.ch/cholm/root/#rootfw
[2] cvs -d /afs/rhic/brahms/BRAHMS_CVS co \
        -d brdbbrowser brahms_app/cholm_pp/brdbbrowser

Next message: murray@comp.tamu.edu: "Re: follow up of pid..."
Previous message: Christian Holm Christensen: "Re: Bdst analyse"
In reply to: Bjorn H Samset: "Re: follow up of pid..."
Next in thread: murray@comp.tamu.edu: "Re: follow up of pid..."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.5 : Wed Nov 20 2002 - 10:01:42 EST