Data Bases in Brahms Analysis

  1. Purpose.
    1. Persistent storage of data between analysis steps.

    2.     The complete analysis process may be viewed as a set of steps during which new data is derived from existing data. The new data will itself be used in later analysis steps as existing data. The data bases store new data and retrive existing data when neded by the analysis programs.
       
    3. Data sharing between colaborators.

    4.     The analysis process is caried out by many groups of people located far from each other. Much of the data produced by one group will be used by the other groups. The data base stores all data to be shared by the groups.
       
    5. Documentation of the analysis process.

    6.     The data base should hold enough information to repeat any program step.
  2. What data needs persistent storage?
    1. Describe analysis scenarios.

    2.     A scenario is a verbal description of a sequence of analysis steps with emphasis on what kind of data is used and how new data is derived in each analysis step.
       
    3. Make data flow diagrams.

    4.     A data flow diagram shows data and programs connected with arrows. It shows for each program all data used by the and all data produced for use in later analysis.
       
    5. HPSS resident data.

    6.     Raw data from the online DAQ system is eventually transfered to the HPSS. The raw online data is organized into files of a size convenient for later retrieval and transport over WAN. The data bases will store the HPSS location of all online data.
       
    7. CVS repositories.

    8.     Programs are also data. data produced depends as much on the programs as on the data input to the programs. All programs used in analysys must exist in the  CVS repository used by the colaboration.
       
    9. Other data bases.

    10.     The data bases discussed here must hold all data not in HPSS or CVS repositories. They  must include references that uniquely identifies HPPS and CVS recident data.
       
    11. Run organized data.

    12.     Some kinds of data will exist per run. Calibrations are examples of a kind of data that will exist for selected runs. The data used for a given run might be looked up in a table with the run id as an index. If a data base is dedicated to a single run it may be possible to talk about a completed data base which can be made readonly.
       
    13. Detector organized data.

    14.     Geometry type data is related to subdetectors.  Calibration data is also subdetector specific.
       
    15. Organization of calibration data.

    16.     Calibration data is calculated for selected runs and is also detector specific. For a given subdetector all calibrations are of the same class. They must be organized so that the current calibration at a given time is easy to identify and access. If accessed indirectly by table lookup it will be easy to introduce calibrations for runs in between existing calibrations. Also improved calibrations for existing ones can easily be introduced without destroying data from which other data is derived.
       
    17. BRAHMSROOT.

    18. The should be one data base address from which it is possible to locate all data base components and thus all data. This will allow data base componets to moved if the appropriate links are updated.
  3. Analysis step (working style).
    1.     A program step is the work it takes to produce and add new data to the data base system.
       
    2. Programs used.

    3.     All programs used in analysis must exist in a CVS before the data produced with those programs can be entered into the data base system. The complete specification of a program includes the CVS location, the module, the program name and version to be known.
       
    4. Data used.

    5.     All data used in analysis must exist in the data base system before derived data can be entered into the data base. For data to be completely specified not only its location must be known but also the analysis step that created it.
       
    6. Temporary results.

    7.     Mostly an analysis step will be the result of an iterative process. It produces temporary data base objects which are not supposed to survive and be used by other members of the colaboration. May be that data base objects can be made private until committed or removed. Alternatively new data are always created locally.
       
    8. Committing the final result.

    9.     When the temporary data have been accepted then the data should be made public or transfered to the shared data bases along with a description of the analysis step. This description includes what data and programs were used and what additional parameters were enterd.
       
  4. Documentation of Analysis.
    1. Data dependensies and analysis steps.

    2.     Except for the initial online data and data representing decissions made during the analysis process all data depends on other data in the data bases. These dependensies also belong in the data base and are descriptions of the analysis steps. A step is completely specified if the information in the step and all input data exist in the data base.
       
    3. Authorization of data.

    4.     To avoid ambiguities and reduce mistakes in the analysis process there may be the need to introduce the concept of authorized or certified data. E.g. there might be exactly one authorized calibration for any given run and subdetector. Inderect access of data via authorized tables of references may be all that is needed.
  5. Location of data bases.
    1. One server many clients.
    2. Growing data bases.
    3. Readonly data bases.
    4. Local copies of data bases.
    5. Committing to the master data base.
    6. Data base identification.
    7. Moving data bases.
  6. What functionality (API).
    1. Relational or Object oriented.
    2. Open and Create Data base.
    3. Put, Get and Remove entity.
    4. Replace and Update entity.
    5. Check entity.
    6. Iterate entities.
    7. Select and iterate entities.
  7. Choice of data base machinery.
    1. Complexity of use.
    2. Functionality needed.
    3. Data security.
    4. Objectivity.
    5. MySql.
    6. Root.
    7. gdbm.
  8. Dbase, a small API using gdbm.
    1. Separation of application.
    2. Interface methods.
    3. DbaseEntity.
    4. DbaseVersionedEntity
    5. Demo programs.