Brahms Raw Data format proposal

From: Konstantin Olchanski (olchansk@a2.phy.bnl.gov)
Date: Mon Aug 24 1998 - 11:01:59 EDT


Hi, folks! Please find attached the proposal for the Brahms Raw
Data Format. It defines the format of the data that will come out
of the DAQ system and would go into the RCF data store and would be
distributed to the online detector monitoring programs.

Konstantin Olchanski, olchansk@bnl.gov

Date: 14 Aug 1998
Author: K.Olchanski
Description: Raw Data Format

0. Definitions:
   "LE": little-endian byte ordering (Intel, VAX, Alpha, any WinNT system)
   "BE": big-endian byte ordering (IBM, SGI/MIPS, SUN, PPC).
   "uint32": unsigned 32-bit integer

0.1. Byte swapping issues

   Why byte swapping? The DAQ front-end VME CPUs are all big-endian (BE
   PPC and m86k). The DAQ back-end host computer CPU will be either BE (SUN)
   or LE (WinNT box). The reconstruction computers (both BRAHMS online
   analysis and RCF farms) will be LE (Intel-based systems running Linux).
   Obviously BE to LE conversion of the raw data cannot be avoided.

   The byte swapping approach implemented in this data format is:

   - all the byte counters and other formatting words are defined
     to be LE (little-endian). These words will not normally be exposed
     to the data users, who (users) will be expected to access the data
     through a library of C/C++ functions. The overhead of swapping these
     bytes on the SUN SPARC CPUs (the only BE machines we expect to use)
     is expected to be small.
   - to reduce the byte swapping overhead for user-visible data, each
     data record inside an event can be byte-swapped individually,
     as needed. To do this, each data record carries an endiness bit,
     which can be used to determine if the endiness should be changed, and
     a record format identifier, which can be used to determine which
     byte swapping function to call.

1. "Raw Data File" format

   A "raw data file" is a disk file containing raw data written by the
   DAQ system. The data sent to the RCF will also use this format.
   The file will contain a sequence of "events" concatenated together:

   | event1 | event2 | ... | eventN |

2. "event" format

   Event format:

   -
   | 4 bytes (LE uint32): byte count (See note 2a)
   -
   |
   |
   | raw data block (See note 2b)
   |
   |
   -
   | 4 bytes (LE uint32): CRC-32 (See note 2c)
   -

Notes:
   2a) the byte count is in little-endian format and includes the size of
       the raw data block plus the 4 bytes of CRC-32. The byte count does
       not include itself.
   2b) the size of the raw data block *has* to be a multiple of 4 bytes.
   2c) the CRC-32 is used to check the data integrity. It is in little-endian
       format and I intend to use the same algorithm I have used in E852 data.
       If CRC calculations turn out to be prohibitively expensive, the
       CRC word will be set to zero (or some other well defined value).

3. "raw data block" format

   -
   | 4 bytes (LE uint32): format identifier word,
   | always set to 0xffffff50, see Note 3a.
   -
   | 4 bytes (LE uint32): number of data records
   -
   |
   | data record 1
   |
   -
   |
  ...
   |
   -
   |
   | data record N
   |
   -

Notes:
   3a) it is useful to have a "data marker" or "tag" or some kind
       of identifier somewhere in the data that can tell us that
       "the data that we got is the data we expected to get". This marker
       or identifier can be used to:
       - verify that a data file is in fact a BRAHMS raw data file, as
         opposed to some other binary or text file.
       - while reading from a tape, a pipe, a network TCP connection,
         or any other type of data stream with no natural record
         delimiters, if the reader somehow gets out of sync with the stream,
         a fixed marker in the data can be used to resynchronize the reader
         with the stream.
       

4. "data record" format

   -
   | 4 bytes (LE uint32): record length in bytes
   -
   | 4 bytes (LE uint32): record identifier
   -
   | 4 bytes (LE uint32): 32 bits of flags, the bits are:
   | bit 0: =0, record is in LE format, =1, record is in BE format
   | bits 4..7: 4-bit data format code.
   -
   |
  ... record data, aligned to 4 bytes, length is a multiple of 4.
   |
   -

5. Data record format definitions: (to be done later)

6. What data records will exist:

   - event header record: to contain
        * run number, event number, etc...
        * a few words of trigger data
   - per-VME-crate data buffers: these most likely will only be used
        during initial system testing and will not be present in the
        output data stream for "real" data
   - per-fastbus-module, per-SBE, per-SBE-CAMAC module data buffers,
        same as above.
   - trigger record: all the trigger information
   - per-detector/per-sub-detector data records.

KO



This archive was generated by hypermail 2b29 : Tue Feb 01 2000 - 20:35:19 EST