On Thu, Mar 02, 2000 at 10:34:50PM +0100, Christian Holm Christensen wrote: > Hi JH, et al. > > On Thu, 2 Mar 2000 12:26:27 -0500 "J.H. Lee" <jhlee@bnl.gov> wrote: > > Dear Anders, > > > > I think the proposed dB is well thought out and nicely organized. > > I have a few comments/questions on the proposal. > > Hmm, I think the current run database (called RUNDB on pii3) isn't > very good. Actually, I think it should undergo a complete > revision. Man, you jumped the gun, and how! The RUNDB database is only just being organized. There will be more than one table. At least two- "Files", already fully implemented, and "Runs", in the process of being implemented. See also comments below: > Far to much data is present in the one table that makes up > this database. Rather is it should be split into a number of tables. > I coould imagine the following: ... > > This will insure better safety, lesser size, and so on. If you're > familiar with normal-forms (NF) of databases, I doubt that the current > RUNDB is even in the second NF. What I propose, I believe to be close > to the 4NF, at least it's 3NF. There are arguments for and against using normalized databases. I know that many database textbooks recommend splitting the data into many, many tables. However, in the RUNDB case, most of the data is "read-only" (the run information is stored once, and never changes, unlike calibrations data) and the volume of data is "small" (compared to business databases). Because data is never updated, data integrity can be easily enforced by the application that fills the database. Because the data volume is small, data duplication caused by non-normalized tables is not a big problem. Also consider the typical access patterns for the run database. Typical queries search for runs matching a criteria (number of events > 0, run type != debug, MRS angle > 10, stored into HPSS == true) and return all or most of available information about the run. This actually benefits from storing all information in one table because it avoids doing the same JOINs over and over again (remember- the data never changes). This does not come for free- the price is data duplication, but again, because of the small data volume I would argue that it is not a problem. > > - Where will calibrated geometry data belong to? ... > > > - How about the Magnet related information? ... > > - Same question for the environmental parameters. Temperature, > > Atmospheric pressure and so on..., if there are more than one > > reading per run. This depends on the source of the data. Data such as magnet settings, beam parameters, termperatures and such will be periodically polled by a program running on opus. This program can trivially store the data into one or more tables in a MySQL database, indexed by unix time. One can store them into the event stream, but it is harder to extract numbers from the event stream than from a database. > > - I had a discussion with Konstantin about a "File" database. ... > > > - Some fields for the Run database(table) have been put in. More fields > > will be added (or deleted) > Uh!? Methinks we should accept it as a fact that as the experiment evolves, the database will evolve, too, and tables and fields will be added and removed as needed. > Of course!? I believe you're wrong. It's true that during > commishioning we'll create/drop tables, add/delete columns from the > databases, but this should be minimized. Constructing a good, solid > schema from the beginning, will prevent much of such dangerous > operations on the DB's. Hmm... to build a database that will never ever need to be changed requires predicting the future. But then maybe you have a better crystal ball than I do :-) > Again, I suggest that you revisit your schema > for RUNDB. Also, I'd be happy if you could change the name of the > database to 'BrahmsRun' as this will work better with BRAT DB classes The schema on the RUNDB is evolving as we speak. You can always see the current status at http://pii3.brahms.bnl.gov/~daq/rundb. > The promised document will hit the web-pages at the earliest tomorrow > (Friday) and at the latest Monday. I'll keep you posted. I am looking forward to read it. -- Konstantin Olchanski Physics Department, Brookhaven National Laboratory, Long Island, New York olchansk@bnl.gov
This archive was generated by hypermail 2b29 : Thu Mar 02 2000 - 21:39:22 EST