From: Christian Holm Christensen (cholm@hehi03.nbi.dk)
Date: Wed Nov 20 2002 - 10:00:43 EST
Hi, Some comments on design of `DST' analysis framework. These comments are only dealing with software issues. The most important issue in this discussion is the design of the data structures. I cannot stress that point enough. If we have good data structures half the job is done. Hence, I suggest you start thinking about data structures, write-up a specification (preferably in UML) and post it on the web for general discussion. Do _not_ start coding until a design has been agreed upon. A few points I'd like to raise concerning the data structures: * Each data structure must have a singular purpose. That is, a data structure should not be used for many kinds of storage of data. * Each data structure must be optimised for storage. That is, no redundant data members of the classes, and the smallest possible data type must be used. For example, if one has a number that will only take values in the range -32767 to 32767 (PID for example), a simple `Short_t' will do - an `Int_t' is overkill (not to mention a `Float_t'). * Data structures _must_ be of fixed size. That is, no memory allocation must be done in the data class (that means no pointers!). This is so that `TTree' may be fully exploited. * If a data structure needs to refer to other data structures, they should do so via either a `TRef' of `TRefArray' data member. * Use `TTree' and not `TNtuple'. One can only store `Float_t' or `Double_t' in a `TNutple' and that's just not flexible enough, especially if we need to use the same data structures for Au+Au, Au+d, and p+p. * One must be vary of virtual member functions in the data structures. Virtual function calls are expensive, and derived classes should be kept to an absolute minimum. * `Applied Cuts' information is as much part of the resulting data set as the physics data, and should be treated on an equal footing to that. That means that `cut information' should be written to the output file as data structures rather than to a separate ASCII file. That is easily done, utilising collections and customised data structures. All these considerations should also help speed up the analysis jobs. A few comments on the analysis code: * Each class (module, task, whatever) may only do _one_ thing. * The framework must allow for a high degree of customisation (a la `bratmain' and configuration scripts). * If multiple loops is needed, then the best way to add information is to use friends of trees (`TTree::AddFriend'). In that way, one can keep all the information without copying, and it's flexible enough to facilitate redoing a seperate step. * Each step should be done as a separate job, resulting in new output files. Multiple loops over the data in the same job is a waste of time. * The jobs should cut away as much information as possible as soon as possible. I do _not_ recommend using the BRAT data structures and modules as the basis for a DST framework. The problem is, that BRAT is far too slow due to far to many allocations and deallocations in the code, and that it's not really geared for `TTree's. Instead, I would recommend a schema along the lines of this [1] package. Djam had some hick-ups on the DB stuff - in particular he noted that the comment field of a calibration revision _must_ contain an informative string, or it will be near impossible to figure out what happened in the calibration. I cannot but agree more whole-heartedly, and if anyone does not add informative comments in the revisions, they should be rolled in tar and feather, put on a railroad track and carried into the Atlantic ocean. Secondly, I'd like to point out that SQL is quite a simple language, but you don't really need to know a lot about it. Instead, use our specialised tool `brdbbrowser' available in my CVS area [2]. There are also quite a lot of general-purpose mysgl browsers available out there. ___ | Christian Holm Christensen |_| | ------------------------------------------------------------- | | Address: Sankt Hansgade 23, 1. th. Phone: (+45) 35 35 96 91 _| DK-2200 Copenhagen N Cell: (+45) 24 61 85 91 _| Denmark Office: (+45) 353 25 305 ____| Email: cholm@nbi.dk Web: www.nbi.dk/~cholm | | [1] http://cholm.home.cern.ch/cholm/root/#rootfw [2] cvs -d /afs/rhic/brahms/BRAHMS_CVS co \ -d brdbbrowser brahms_app/cholm_pp/brdbbrowser
This archive was generated by hypermail 2.1.5 : Wed Nov 20 2002 - 10:01:42 EST