Data Bases in Brahms Analysis
-
Purpose.
-
Persistent storage of data between analysis steps.
The complete analysis process may be viewed as a
set of steps during which new data is derived from existing data. The new
data will itself be used in later analysis steps as existing data. The
data bases store new data and retrive existing data when neded by the analysis
programs.
-
Data sharing between colaborators.
The analysis process is caried out by many groups
of people located far from each other. Much of the data produced by one
group will be used by the other groups. The data base stores all data to
be shared by the groups.
-
Documentation of the analysis process.
The data base should hold enough information to
repeat any program step.
-
What data needs persistent storage?
-
Describe analysis scenarios.
A scenario is a verbal description of a sequence
of analysis steps with emphasis on what kind of data is used and how new
data is derived in each analysis step.
-
Make data flow diagrams.
A data flow diagram shows data and programs connected
with arrows. It shows for each program all data used by the and all data
produced for use in later analysis.
-
HPSS resident data.
Raw data from the online DAQ system is eventually
transfered to the HPSS. The raw online data is organized into files of
a size convenient for later retrieval and transport over WAN. The data
bases will store the HPSS location of all online data.
-
CVS repositories.
Programs are also data. data produced depends as
much on the programs as on the data input to the programs. All programs
used in analysys must exist in the CVS repository used by the colaboration.
-
Other data bases.
The data bases discussed here must hold all data
not in HPSS or CVS repositories. They must include references that
uniquely identifies HPPS and CVS recident data.
-
Run organized data.
Some kinds of data will exist per run. Calibrations
are examples of a kind of data that will exist for selected runs. The data
used for a given run might be looked up in a table with the run id as an
index. If a data base is dedicated to a single run it may be possible to
talk about a completed data base which can be made readonly.
-
Detector organized data.
Geometry type data is related to subdetectors.
Calibration data is also subdetector specific.
-
Organization of calibration data.
Calibration data is calculated for selected runs
and is also detector specific. For a given subdetector all calibrations
are of the same class. They must be organized so that the current calibration
at a given time is easy to identify and access. If accessed indirectly
by table lookup it will be easy to introduce calibrations for runs in between
existing calibrations. Also improved calibrations for existing ones can
easily be introduced without destroying data from which other data is derived.
-
BRAHMSROOT.
The should be one data base address from which it is possible to locate
all data base components and thus all data. This will allow data base componets
to moved if the appropriate links are updated.
-
Analysis step (working style).
A program step is the work it takes to produce and
add new data to the data base system.
-
Programs used.
All programs used in analysis must exist in a CVS
before the data produced with those programs can be entered into the data
base system. The complete specification of a program includes the CVS location,
the module, the program name and version to be known.
-
Data used.
All data used in analysis must exist in the data
base system before derived data can be entered into the data base. For
data to be completely specified not only its location must be known but
also the analysis step that created it.
-
Temporary results.
Mostly an analysis step will be the result of an
iterative process. It produces temporary data base objects which are not
supposed to survive and be used by other members of the colaboration. May
be that data base objects can be made private until committed or removed.
Alternatively new data are always created locally.
-
Committing the final result.
When the temporary data have been accepted then
the data should be made public or transfered to the shared data bases along
with a description of the analysis step. This description includes what
data and programs were used and what additional parameters were enterd.
-
Documentation of Analysis.
-
Data dependensies and analysis steps.
Except for the initial online data and data representing
decissions made during the analysis process all data depends on other data
in the data bases. These dependensies also belong in the data base and
are descriptions of the analysis steps. A step is completely specified
if the information in the step and all input data exist in the data base.
-
Authorization of data.
To avoid ambiguities and reduce mistakes in the
analysis process there may be the need to introduce the concept of authorized
or certified data. E.g. there might be exactly one authorized calibration
for any given run and subdetector. Inderect access of data via authorized
tables of references may be all that is needed.
-
Location of data bases.
-
One server many clients.
-
Growing data bases.
-
Readonly data bases.
-
Local copies of data bases.
-
Committing to the master data base.
-
Data base identification.
-
Moving data bases.
-
What functionality (API).
-
Relational or Object oriented.
-
Open and Create Data base.
-
Put, Get and Remove entity.
-
Replace and Update entity.
-
Check entity.
-
Iterate entities.
-
Select and iterate entities.
-
Choice of data base machinery.
-
Complexity of use.
-
Functionality needed.
-
Data security.
-
Objectivity.
-
MySql.
-
Root.
-
gdbm.
-
Dbase, a small API using gdbm.
-
Separation of application.
-
Interface methods.
-
DbaseEntity.
-
DbaseVersionedEntity
-
Demo programs.