Kris,

BRAHMS Analysis Stages, Event Model,

and the Brahms Root Analysis Toolkit (BRAT)

K.Hagel and F.Videbæk

September 7, 1998

(Draft)

Introduction

This document will deal with the data model, data storage, as it is perceived now. The description of the specific user modules, details of data structures, as well as code organization will be available in other BRAHMS software notes. It will also describe how this is implemented using the BRAT

BRAHMS Analysis Stages

Data occurs from many stages. Due to efficiency in accessing and the desire and need not to duplicate data, they will have exist in different files, part of the data possibly (event tags or run tags) in an Obj-y database. This will minimize duplication of data. Later analysis stages can thus proceed fairly efficient in so far the 'right' data are defined and kept for the later analysis stages in the same files.

The analysis stages envisioned to day are

Simulated Event Generator Data e.g. Fritiof 7.02, Venus.
Generation of digitized data. This can happen in two distinct ways.

GBRAHMS simulations will generate hit data from the event generator data using a detailed description of the detectors. This step will also generate digitized data in a slow or fast mode for the detectors.
DAQ will generate raw data i.e. the data as will be written from the DAQ front end. It may be that another raw data format relating to the DAQ but created in simulations exists.

Generation of Calibration database from raw or digitized data.
Reconstruction, which logically includes generation, calibrated data (intermediate results) local tracks, tracks, and Particle ID, as well as global information
Generation of Reconstructed Data Objects (RDO)
Creation of Physics Data Sets (PhDS) [run, software & hardware trigger selection]
Analysis of data using PhDS.

Each of these stages and the data they generated are discussed in more detail in the following sections.

Production of Event generator data

This includes running event generators like Fritiof, Venus, and others. At present the output format is Zebra based. It can be used in the GBRAHMS program, and in stand-alone programs for looking at expected physics distributions. It is worth to remember that the open model standard proposed by Y.Pang keep the event generator output as ASCII formatted files. We could change our Geant and analysis model to be able to read this. Maybe others (Longacre) have already done this for STAR? It is though put a low priority since a working scheme does exist.

GBRAHMS simulations

This stage takes the data from the event generators and creates events with hits and track information for active detectors. The data are generated from GBRAHMS and are presently written as c-stream files (flat files). The underlying data structures are table like i.e. a collection of structures. This step is quit time consuming using 1-4 minutes per Au Au event depending very much on centrality and angle settings.

gtracks : Track parameters
ghits : Hits in detectors
gvolume : Geant geometry information

DAQ and Online Stage

Raw data from the DAQ which can be in 3 different forms

- translated

- un translated

- pedestal or non-pedestal subtracted data.

Raw data from digitization routines. These will be like raw data, but in addition there will be relations data structures correlating digitized data with simulated hits. These relation or associations should be in separate data structures.
It may turn out to be useful to have a level 0 conversion from a compact raw data format to the first level of digitized data in e.g. ROOT structures. Thus the raw data format are different from the digitized information presented to the first analysis modules. In this case a special module must be called to convert raw data into digitized data. This would possible also help in having the raw data in a more compact format.

Data Calibration

Calibration data (for the calibration database) first have to be generated. Hopefully much of this can be accomplished in near-real time. The second best choice is to have calibration performed during the first step of reconstruction by first executing the calibration pass on the raw data files read from HPSS storage.

Data reconstruction

The first step is Raw-> calibrated data. This requires access to a database with calibration constants, and might additionally include simple clustering of tracking hits. Depending on the CPU needs these data could be short-lived, or persistent. The choice may depend on the need to go back to raw data in subsequent analysis.

Event reconstruction is a second step of data reconstruction. This will involve local and global tracks from the spectrometers. Most of the particle identification should be performed in this stage. It is not obvious that it can be completed. E.g. it is necessary to have the knowledge of tracks to obtain the 'final' time calibration of TOF walls before PID. At the minimum at first order calibration and PID should be archived in this stage.

The data objects stored persistently comes in several classes.

Hits on tracking detectors; track to hit associations.
Local tracks in tracking stations
Global spectrometer tracks
PID information
Multiplicity information, vertex information
Physics tracks

In the MDC these will be on a single output file (root file). In the production mode we will like to split these to produce RDO files from which Physics Data Sets can be extracted efficiently by run and trigger selection, and the more voluminous output of hits, associations, calibrated data. In production mode it is likely that these will only have to be stored for a smaller subset of data (10-15%).

Certainly early on but also later it may be needed to re-run part of reconstruction to get proper particle identification. Another possibility may be depending on i/o versus cpu needs to have standard modules included in the subsequent PhDs generation.

Physics Data Set generation

This data pass will select from the large sample of Reconstructed Data Objects those that can be used in specific physics analyses. Such sets can e.g. consists of

Specific particle kinds and trigger selection for many angle settings
Detailed global information (multiplicity, beam-beam counter).
Two-particle data from a large number of data files, but only a single angle setting.
Calculation of tight multiplicity cuts to select out very central events.

These runs can be selected based on the information in the run database as well as from the event header summary (event tag file, run summary on disk). These selection tasks have to be coordinated to get reasonable access to the HPSS file system and transfer speed. This will most likely be done using the ROOT frame work; maybe using a PROOF like system if the same files are used extensively by multiple analysis/ data set generations.

Analysis ofPhysics Data Sets.

Some of this analysis will be done at RCF, but a significant fraction will be done at the collaborators home institutions.

Event Model

This chapter deals not only with the event model but also with data objects needed at different stages, and how this is implemented in the ROOT classes.

Overview of data objects needed

In addition the division on data on the stages (passes), it seems reasonable also to think on a division of data on the level of detector parts. Each of these parts may or may not be on the same kind of storage e.g. data from TPCs could be on different files. This would help in speeding up later analysis and selection.

Event tags

They are short data structures that includes references (keys) and very truncated information on the other pieces of data belonging to events. The division below is most relevant to raw and calibrated data. The main reason to have these is that the event object model must be aware of data being distributed of several physical files. It can thus not easily employ a comprehensive object model as is done in many ROOT examples. They rely on using pointers as well as memory or disk access to all objects.

Global Data

Beam-Beam counters

Multiplicity
ZDC
DAQ, trigger information

Mid Rapidity Data

MT1, MT2

TOFW

WM1, WM2

Forward spectrometer (FS)

FFS

T1, T2

WF1, WF2

H1, C1

BFS T3, T4, T5

H2, RICH

For later stages it may be more natural with three divisions according to detector component.

Global data

Centrality selection

...

MRS

Tracks

Tracks, PID information

Data structures and analysis modules.

The data structures should be as independent as possible from the analysis modules, which serves as control modules. The BRAT implementation is partly based on some of the concepts.

The event data to be created and accessed by the analysis code can be of many different kinds. Tables are a simple kind of structures which for many analysis stages are quite sufficient. They can be dealt with by having a table header and a list of table objects. It is necessary for the table header to have a description of the data in order to store and retrieve them to and from persistent storage (flat files, Obj-y databases, ROOT files). E.g. in STAF the table components are defined using the Corba interface definition languages (IDL files). Obj-y requires .ddl files while ROOT uses the class description from the .h files. It has though been obvious that also more general event objects are needed. This could be a list of other event objects or a list of some fundamental objects (like TPC clusters, etc). For these to be stored the same kind of argument holds. Thus a description is needed. ROOT does this by defining the objects with the classdef macro's. The files contain the name and version of an object. The code necessary to read and manipulate the objects must be linked into the reading program e.g. using shareable libraries. The objects can only be dealt with if the code reading these can defined them dynamically, and has a reading algorithm present. In other methods e.g. when using an object oriented database like Objectivity all the data objects are derived from a base persistent class, and defined special by description files (DDL files). It is a concern if the data model changes i.e. schema generation and schema evolution.

BRAT event data.

The fundamental object is defined as BrDataObject. All other event objects are derived from these. Since it is not known at this stage (summer 98) what objects are needed it was decided to design a logical structure that is flexible, expandable and built on the Root framework. It still resembles in many ways a traditional 'bank' structure. Event objects are organized in directory like structures. This is illustrated in the figure below. The basic properties of an dataobject

It has a name and can be referred to by it.
It is by default considered persistent i.e. it will be written if a tree it belongs to ,i.e. an event node is requested to be transferred (through an BrIOModule)
It can be an event node.

This is an example of the tree structure. Each BrEventNode contains a list of data object that can be simple, a table or another event node. This way complicated directory like structures can be build. A data object is identified by a name.

Example 1:

How to find a dataobject and access data members.

BrDataTable* bbdiglist;

bbdiglist = digitized_data->GetDataTable("DigBB left");

If( ( bbdiglist != NULL){

Int numhits = bbdiglist->Entries();

For(ihit=0; ihits < numhits; ihit++){

// this is the most general way using the ROOT access methods.

Digbb_p= (BrDigBB*) digbblist->At(ihit);

Or since

User modules

These are the control modules that has a set of well defined entries such that they could be callable by a simple framework or command language. It must be possible to transfer the knowledge of data from the general frame work to user modules without being specific. As an example expressed using C++ like code

User->event( evtobj* obj1 const, evtobj* obj2);

Note the separation the input obj1 and output obj2; the routines are allowed to add to the obj2, but not to obj1. It might even be very useful if all objects are named such that a module can specify what objects are required on input and expected on output. Within the object model described above that is based on ROOT this is achieved naturally.

Thus the framework deals with two quite separate entities, namely event objects and modules which can be called at different instances

module->Begin(Job*)

module->Event(Evtobj,...

module->Finish(...)

module->Book()

The purpose of these entry points is as follows

Event(BrEventNode*, BrEventNode*). This is the normal method for the event processing. The event object needed are found in the first node, while output objects are added to the second event object. Even if module does not need output it should be called with two parameters. This makes for a nice uniform handling of modules and helps with the implementation and usage of ModuleContainers i.e. a collection of modules.
Book(). Method implemented in base BrModule class as an access method . If an analysis module need to create histograms this method should be called. The actual method defining the histograms should be DefineHistograms(). The Book method will ensure that all histograms are added to the histogram list maintained by a module.
Init(). Envisioned to be called once per module per session/run. The main reason this is a separate method from the constructor is that specific parameters/database access has to be done before the init() and which is not so conveniently done through the constructor. It is still a question if the Init() should be called with an environment object, or whether it should have access to global pointer (in the style of ROOT,.. gBratEnvironment).
Finish(). Envisioned to be used to complete analysis, run statistics and eventually saving storing such output in files.

For up to date details the user should consult the automatically generated class description available through the BRAT web pages

Open issues and thoughts.

To keep track of what data which belongs to the same event when multiple structures and files are in use is not a trivial task. It can be done by defining trees or directory structures that contains the information on the event and run number. The event objects as defined above could be the ROOT object for a tree which belongs to a given analysis stage (raw, dst etc.).

- The use of detectors vs. modules;

I would think of a detector more as an object, which has some 'geometry' operators associated with it, rather than performing the 'actions' on hits as you envision. I would tend to use the term Analysis Module more. An analysis module may or may not refer to the 'geometric' properties of a detector. This train of thought come from considering what happens in the later analysis stages e.g. combining tracks, matching tracks with TOF hits, where the object of interest is not a detector.

There is certainly use for a detector concept e.g.

detector

- geometry, coordinate transformation, display, digitization parameters

- display of hits, projecting tracks to detector etc

Some module need to know about detectors; also more than one module may need to know about a given detector. This leads to a object division in classes like

-event objects

-geometry objects

-analysis modules

Objects in these should be allocable dynamically and searchable by name. The default Event() entry are called with an event object as input and one for output.

Geometry should be thought through. There are several coordinate systems in Brahms. The global, the forward spectrometer system (fms1, fms2 as well as the 'turned low momentum T1-D2-T2-H1 position. How do one communicate the objects on coordinate transformations. The transformation is used both for analysis as well as for display purposes.

How to connect the geometry, plotting, and tracking objects and methods.

Magnet classes; effective edge approximation, magnet maps, Tosca. Method to combine tracks in magnets.

Calibration access and connection to non-ROOT databases. This could be to a Obj-y database or an conventional relational database (Oracle).