Re: Run database: Objectivity or Relational database

From: Konstantin Olchanski (olchansk@a2.phy.bnl.gov)
Date: Thu Sep 23 1999 - 00:47:56 EDT


(Second try at sending this!)

There are two separate questions, (con)fused together.
Let me explain.

The first question is: which type of database to use- object or relational?

With an object database, one uses a meta-language
(i.e. Objectivity Data Definition Language, DDL) to describe
the data. Such languages are very flexible and allow one to
define (and store in the database) data structures of arbitrary complexity.
This is a very useful for "our" applications, such as calibrations
and run databases.

With a relational database, all data has to be presented to the database
as rows in, possibly multiple, tables. Our applications use
C struct's and C++ classes, and to store them in such database, code
needs to be written to convert them into table rows. Notice that in
an object database this conversion/packing/unpacking code also
exists, but we do not have to write it- it is automatically generated
by the DDL compiler.

That said, it looks like we want to use an object database for all our needs.

The second question is: do we want to use Objectivity?

Since it is an object database, and we do want an object database,
the answer would be "yes", if we accept the risks and the costs
involved in using this particular object database.

My biggest concerns over Objectivity are:

1) Objectivity is not "Open Source". This makes our project vulnerable
   to support problems:
   - not all combinations of operating systems and compilers are
     supported. Today, I checked the objectivity web pages. On Linux,
     RedHat 6.0 is still not supported despite being out for more than 1/2
     a year. RedHat 5.1 and 5.2 are supported, but with the now obsolete
     egcs-1.1 compilers (not the 1.1.1 or 1.1.2 compilers). On Solaris,
     sunos5.7 is finally shown as supported, but only with the SUN
     compilers. The DAQ project is using the GNU compilers and no version
     of gcc is supported, so no DAQ code can be linked with Objectivity.
   - do we have any support at all? If we find a bug in Objectivity,
     is there a procedure to report it back to the developers and get it
     fixed in a reasonable time? Do we have a support contract
     with Objectivity? Do we have to go through the middlemen at the RCF?

2) Vaning interest in Objectivity at BNL. Two years ago 4 RHIC experiments
   were interested in using Objectivity. Today, only one, PHENIX,
   is actually using it. STAR tried it and then went with MySQL for
   all their database needs. I do not know what PHOBOS do. There are also
   strong rumours about LHC experiments abandoning Objectivity.
   Maybe they know something that we do not know?

Are you confused yet? Not for nothing it was said "Do not ask the elves
for advice, for they will say both 'yes' and 'no'".

And no for something completely different:

>
> Concerning the Run DB. If you haven't made a decision on how to go
> about that DB, I suggest we somehow use Objectivity for that
> database. Let my try to summerize the reasons why I think this is a
> good idea.
>
> * Since the RunDB is fairly simple, it should prove to be a simple
> task to implement such a DB in Objectivity. By simple, I mean that -
> as I see it - that the information (or entries) in such a DB would
> be reather uniform and simply keyed. I could imagine a doubly linked
> list of C++ classes like:
>
> class RunEntry {
> private:
> int fRunNumber;
> time_t fStartTime;
> time_t fEndTime;
> string fTrigger[];
> string fAuthor;
> string fComments;
> // and other simple fields that might be needed.
>
> RunEntry *fNextEntry;
> RunEntry *fPreviousEntry;
> public:
> RunEntry(int no, time_t start, string trigs[],
> string author, ...);
> void SetEndTime(void);
> time_t GetStartTime(void);
> // and other similar Get and Set methods
> };
>
> and no other kinds of objects need to be stored (appart from
> internal Objectivity indicies to facilitate fast lookups and
> queries.)
>

This is oversimplified. Much more information is generated by
pieces of the DA system, that needs to be collected from the different
sources and stored in the run database:

1) the run information, similar to what is listed above, is generated
   by the Run Server.
2) information about the raw data files is generated by the event builder
   output stream drivers and the HPSS interface. This info is written
   into ascii files in the DA "data spool" directory. For each
   output file (there are multiple output files per run, to limit
   the file size) an incomplete list of data fields is:
   - file name,
   - file size,
   - time file was started, ended,
   - event counts for each trigger combination,
   - time file was moved to HPSS,
   - file HPSS filename.
3) the data from the per-run scalers. These will be collected by
   the slow controls-type interface.
4) run setup information- spectrometer angles, magnet settings, beam
   information. This will be collected by a run setup gui.

>
> * The way that I'm currently implementing the Calibration DB, one
> would do lookups on a specific time (infact a UTC time). However,
> when doing actual data chrunching, mining, and analysis, one would
> properly do query bassed on run numbers, triggers or run types, or
> any combination of these, and not on time. Hence my idea is
> something like this:
> - One does a query on the RunDB, using some combination of lookup
> keys, and a (or possibly a set of) RunEntry object(s) is(are)
> returned.
> - Using the fStartTime, and fEndTime of the RunEntry object(s), the
> program does a query on the CalibDB for calibrations valid in
> that time period, for any detector needed.
>

Some data is best accessed by time, because it can change or is recorded
independantly from run numbers. An example is all the data collected
by slow controls (magnet settings, temperatures, pressures, high-volatges).
Other data is best accessed by run number. Maybe you need to have ways
to access data using either of the two keys- event time or event run number.

>
> * Still, if a releational DB (SQL) query is build into BRAT/ROOT,
> there is still one more argument for the use of Objectivity: If one
> really wanted to do so, one could add references to the run entries
> to specific calibration, using any kind of relation (uni-directional,
> bi-directional, 1-1, 1-n, n-1, n-n).
>

But no matter what the underlying implementation is (Objy or SQL),
the user interface is "getData(time)" and "getData(runNo)". As long as
the "how" the data stored and retreived is hidden from the user,
one cannot make arguments in favour of neither implementation.

The best you can say is "the database package X will let me do it
faster/better/easier than package Y". And then, it is always a question
of "do some work now versus do some later" tradeoffs, which involve
crystal balls and other gadgets to predict the future.

My crystal ball tells me that using MySQL will require more work
upfront (we will have to write the code that the Objy DDL compiler
writes automatically), while using Objy will be less work upfront
and more work later, as we hit problems porting the software to new
versions of compilers (when the whole world will be using RedHat
SuperHyperLinux 2010 with GCC-9.99, will we be stuck with RedHat 5.2
and egcs-1.1?), or try to run on fancy new computers (those Linux/Alpha
computers are damn fast and have good bang per buck. Do Objy
support Linux/Alpha? We will never ever want to run on an Alpha?
Come back with the answer next year. How about Linux/IA-64?)
or just hit plain boring bugs in Objy.

>
> * Finally, there is a way to do SQL queries on an Objectivity
> database, using the Objectivity/SQL++ component of
> Objectivity. However, I know nothing of this component, since it
> isn't avaliable yet on Linux, at least not at RCF!
>

This bothers me. What exactly is Objectivity, Inc, commitement
to Linux? RedHat 6.0 is still not supported despite being out for
a long time now. Now we see important parts of Objy missing. What
is their record on customer support issues (such as accepting
problem reports and issuing fixes and updates).

After being burned by enough nice commercial products that quickly
turned into maintenance headaches I am very sensitive to this.

>
> Anyway, these are a couple of thought I had one the subject. If a
> decision has already been made, please let me know. I believe you
> advocated for a relational RunDB, when we talked this summer, but
> maybe my points can make you reconsider.
>

No decision has been made.

As far as the D.A. and Slow controls are concerned, the plan is to
have various system components record the information in
ascii files. Whatever database system we use will read these ascii files
and store the data in the real database. Since the database could be
completely rebuilt from the ascii files, they will be kept
around (maybe archived to tape) as backups against database corruption.

This segregation between the database and D.A./S.C. systems calls for a well
documented interface between them (a good thing in itself) and
keeps our options open as far as the choice of compilers and operating
systems is concerned- the database system would not even have to run
on the same computer as the D.A. and the S.C. systems.

If time permits, I may build a simple SQL database for a subset of run and
raw data file information. With MySQL and the DBI.pm and CGI.pm perl modules
this would be a quickie "one afternoon" project. Most likely, the resulting
database will be useful for interactive queries using a Web interface
and not so useful/usable in the reconstruction/BRAT/ROOT framework.

-- 
Konstantin Olchanski
Physics Department, Brookhaven National Laboratory, Long Island, New York
olchansk@bnl.gov

Konstantin Olchanski Physics Department, Brookhaven National Laboratory, Long Island, New York olchansk@bnl.gov



This archive was generated by hypermail 2b29 : Tue Feb 01 2000 - 20:35:04 EST