From: Christian Holm Christensen (cholm@hehi03.nbi.dk)
Date: Wed Dec 11 2002 - 17:55:25 EST
Hi Bjorn et al (in particular Djam, Kris, and Flemming), First off, thank you for opening this bag (sack?) of worms - it's an important issue and needs careful deliberation before implementation. But, I'm getting ahead of my self here - first some comments to your proposal. Bjorn H Samset <bjornhs@rcf2.rhic.bnl.gov> wrote concerning Proposing BRAT/bratmain addition (PLEASE READ!) [Tue, 10 Dec 2002 14:24:28 -0500 (EST)] ---------------------------------------------------------------------- > > Hello dev'ils. > <snip> ... </snip> > //_________________________________________________________________ > void BrTileRdoModule::Snapshot(ofstream &f, Option_t* option) > { > // Output info on all settable members to file > // Must be overloaded by all modules! > TString opt(option); > opt.ToLower(); > > if (opt.Contains("e")) > f << "Module: " << IsA()->GetName() << " " << GetName() << " " << > GetTitle() << endl; > > BrMultRdoModule::Snapshot(f); > f << PrintIntMember("AdcGapLimit",fAdcGapLimit) << endl; > f << PrintIntMember("SaturationChannel",fSaturationChannel) << endl; > > if (opt.Contains("e")) { > f << "# CVS info: $Revision: 1.19 $ $Date: 2002/09/19 22:27:04 $ > $Author: sanders $" << endl; > f << endl; > } > > } > > (PrintIntMember and similar are functions of BrModule...) Argh! What's so difficult in writing f << setw(8) << fIntegerMember << std::endl; compared to f << PrintIntmMember(fIntegerMember) << std::endl; The additional cluttering of the base class isn't worth the gain. <snip> ... </snip> > > A test bratmain script which contains only the BrTileRdoModule then looks > like this: > > --------------------------------------------------- > Module: BrEventIO BRAHMS ROOT IO File eventInput > > Module: BrTileRdoModule MultTile MultTile > AdcGapLimit 4096 > SaturationChannel 30000 > ThresholdFactor 5.000000 > OutlierMethod 3 > SingleAdcLow 0.000000 > SingleAdcHigh 500.000000 > SingleAdcComp 10.000000 > AdcGapLimit 4096 > SaturationChannel 30000 > # CVS info: $Revision: 1.19 $ $Date: 2002/09/19 22:27:04 $ $Author: > sanders $ This format isn't particularly parser-friendly. Instead, we should mimic the `TObject::SavePrimitive' member function - that is save proper C++ statements: //__________________________________________________________________ // Module: BrTileRdoModule BrTileRdoModule* tileRdoModule = new BrTileRdoModule("MultTile", "TMA reducer"); tileRdoModule->SetAdcGapLimit(4096); tileRdoModule->SetSaturationChannel(30000); tileRdoModule->SetThresholdFactor(5); tileRdoModule->SetOutlierMethod(BrMultRdoModule::kNoCorrection); tileRdoModule->SetSingleAdcLow(0); tileRdoModule->SetSingleAdcHigh(500); tileRdoModule->SetSingleAdcComp(10); mainModule->AddModule(tileRdoModule); ... In this way, the log file is directly `executable' via `bratmain'. > > Module: BrEventIO BRAHMS ROOT IO File ouputModule > -------------------------------------------------- > > I'll get rid of the BrEventIOs in the next implementation - they shouldn't > be there I think (see below for why). > > Also note that two of the parameters occur twice - that's because > BrTileRdoModule has two setters that are identical to its parent class > BrMultRdoModule. This framework also incourages cleaner coding ;-) The particular code you refer to _is_ clean. If you check the usage of the particular data members, you'll see the point of the overloaded functions: provide specific defaults for TMA/SMA. The problem is rather in the code you sent - the Br{Tile,Si}RdoModule shouldn't emit the setting of `fSaturationChannel' etc., as it is BrMultRdoModule that owns those parameters, not Br{Tile,Si}RdoModule. I guess you're confusing data members and member functions, and which one defines `ownership'. In the future, please check your code before pinning `unclean coding' on others code, thank you. > The point of this excercise is two-fold: > * First of all we get proper documentation of all relevant > parameters for a given data reconstruction. This has been missing > for a while and caused problems for both me and several others I > have talked to. Here's my main hick-up with this approach. What is the need of this log file? What's the difference between the log file and the configuration file? I'm afraid I don't see any. Having the script is just as good (in fact better) than having the log file. After all, the script does the job. This really points to the problems of your proposal. Please consider the following: * When do you want the information? - Do you want to take an output file, and figure out how this file was created? In that case, the values of various parameters should be stored in the file itself, as it is really part of the data, just as cuts are really part of the data analysis results. - Do you want to redo the file that you once had but is now gone, or make a similar pass on some other input data? In that case, all you need is the configuration script and the command line options to `bratmain'. - Do you want this information to be used for collaboration-wide production passes or for single user jobs? In the former case, you can make a few assumptions that really simplifies the process - for example, you can assume that the job was executed at RCF in either CRS or CAS GNU/Linux farms. You can also assume that the official database is available, and so on. In the latter case, you can basically just pipe the output from the `bratmain' run into a file (with the appropriate verbosity level), and make your own personal filling system. That output (with verbosity >= 3) is all you really need. * What kind of information do you need? You really only need to know what configuration script was used, it's version number (remember, BrMainModule has fields for setting various version information, as well as author information, and of course as short description) or CVS revision number, the command line options passed to `bratmain', the ROOT and BRAT version. Given these pieces of information, you can easily redo a pass: cd ~/src export CVSROOT=:pserver:cvs@root.cern.ch:/user/cvs cvs login Password: cvs cvs checkout -d root-<version> root -r <version-tag> cd root-<version> ./configure <arch> --prefix=${HOME} make make install cd .. export PATH=${HOME}/bin:${PATH} export CVSROOT=/afs/brhic/brahms/BRAHMS_CVS cvs checkout -d brat-<version> brat -r <version-tag> cd brat-<version> ./autogen --prefix=${HOME} make make install cd .. cvs checkout -d brat-<version> brat/scripts/<configuration-script> \ -r <configuration-script-revision-or-tag> cp brat-<version>/scripts/<configuration-script> ~/scripts cd /scr/<user-name> bratmain scripts/<configuration-script> \ <command-line-options> This of course requires a certain amount of discipline when changing the official scripts in BRAT (and they _must_ be in BRAT), namely: - If a parameter is changed on a module, the version (set in BrMainModule) must be changed, and the script must be committed to CVS immediately. Optionally, a tag can be made. - If a parameter is changed from hard-coded to set-by-command-line-option, the version (set in BrMainModule) must be changed, and the script must be committed to CVS immediately. Optionally, a tag can be made. - When adding a module, changing the order of the modules, removing a module, or similar, the version (set in BrMainModule) must be changed, and the script must be committed to CVS immediately. Optionally, a tag can be made. - ... probably a few other points too. For calibration constants, and the like, we should record in the output file: - the name of the detector, - the name of the parameter, - the revision number of the parameter, - ... other stuff. Again, this should really be part of the output file rather than a log or job summary, as it is an intricate part of the data. * Where you do you want to store the information? - If you want to make plain text ASCII files, how should you manage these files? To be quite honest, I think these plain text ASCII files are a pain in the behind that no-one in their right mind would like to manage. Also, you need to make special parsers to (re)make the configuration scripts and command line options. It is just so much simpler to make sure your configuration script is around, and make sure that you record the history of that script - i.e., put it in a CVS repository. This has so many advantages over the proposed job summaries, not to mention that we don't need to write any code, that I don't see the point of the proposal. - And what should be stored where? The right place to store information on what modules was in the pipeline in a given configuration script - is in the configuration script. Period. The right place to store hard-coded parameters of the modules is in the setters of the modules in the configuration script. Optionally, one can output some information on the parameters of the modules used to create a file in the file itself. After all, the parameter information of the modules is as much a part data, as the produced data. Perhaps we should consider making BrModule and it's descendants persistent, so that one can simply write the module pipeline directly to the file at the end of the job. Then all the information worth knowing on the modules is right there. Of course when then need to go through every module and change the class version to some thing greater than 1, and making sure that caches (arrays used for intermediate storeage, file pointers, histograms, <shudder>ntuples</shudder>, <shudder>trees</shudder>, etc. and ladida) are not written to disk. Again, this is simple, efficient, and we don't need to write too much code (almost nothing in fact). Now, what information are we lacking? Well, we need to tie the following pieces together: - the configuration file, - the ROOT version, - the BRAT version, - the command line options, excluding trivial options, like verbosity, debug level, and the like, and options not changed from the default, and input/output files. - input files - output files. If we draw a UML diagram of what composes a job-summary (or pass), we arrive at +-------------------+ +------------<--------| Input/Output file | 1 | creator +-------------------+ +--------------------------+ inputs | name: string | | Pass |----->--------| type: int | +--------------------------+ * +-------------------+ | when: date | outputs | * | where: string | | | who: string |----->-----------------------+ | what: string |-------+ +--------------------------+ | | | v v 1 | configuration options | * +---------------------+ +----------------------+ | Configuration file | | Command Line Option | +---------------------+ +----------------------+ | + name: string | | + short_name: string | | + version: int | | + long_name: string | | + revision: int | +----------------------+ | + tag: string | | +---------------------+ v | * | values +---------------------------+ | Command Line Option Value | +---------------------------+ | value: string | +---------------------------+ The thing to notice here, is that we have some relations back an forth (`creator' of File, and `inputs' and `outputs' of Pass), as well as some one-to-many relations (`inputs', `outputs', and `options' of Pass), as well as some one-to-one relations. We'd really like to capture this in our job summaries - all this leads to a DB as the obvious choice of storing the information. You probably think that `Ah, here he goes again, talking about ``The Right Way'', and that we need a DB, and ladida, but he forgets that someone needs to write the code, setup the DB, and test it, and ... - and in all, not so easy after all - lets do a Quick'n'Dirty hack [ed: Also known as ``Quirky'n'Defunct''] and be done with it'. Well, there's good news for you: The code is already written, the database is setup, and it little only a little testing! The code is in `BrPassInfoManager' by Kris, and the database I believe is called `BrahmsPass' on rcas0005. Ok, so there's a few things that isn't done altogether yet, but it isn't much, and it _is_ The Right Way(tm) :-) Ok, the point of all this, is that you should really sit down and draw up a list of usage patterns, requirements and restrictions, and ladida, and then figure out what goes where, and why, and how, and when, and so on. Then you'll see that you're really off on the wrong track (sorry, but you are) - and also, it immediately becomes clearer what the design should be. <snip> ... </snip> > * Secondly, we get an easy way of recreating old bratmain scrips. Why recreate them if you can get them from CVS? <snip> ... </snip> > What I'm offering to do is make the main framework for this and also > add Snapshot() to the most frequently used modules, filters, > packages etc. I can't promise I'll do them all as there are 101 > modules in BRAT (and then filters, packages,... come in addition), > but I'll try to do most of it. I'll bet that the work you need to do on `BrPassInfoManager' and ladida is about the same or a tad more, than the porposed work, but the result is 1e6+ times better :-) > (And I have a hand-in exam in "Philosophy of Science" next week, but > hey...) Good luck. (I took that ones too - I have a BA in Philosophy :-). Yours, ___ | Christian Holm Christensen |_| | ------------------------------------------------------------- | | Address: Sankt Hansgade 23, 1. th. Phone: (+45) 35 35 96 91 _| DK-2200 Copenhagen N Cell: (+45) 24 61 85 91 _| Denmark Office: (+45) 353 25 305 ____| Email: cholm@nbi.dk Web: www.nbi.dk/~cholm | |
This archive was generated by hypermail 2.1.5 : Wed Dec 11 2002 - 18:15:02 EST