Summary of Mock Data Challenge 2

Sinking raw data into HPSS

Raw data was sunk into HPSS from the bramsink account using Tom Throwe's script raw_transfer_pftp.pl.
  1. Feb 20 (16:50-22:36) on rcf.rhic.bnl.gov using sink.pl from rmds03:/disk30000/brahms/gbrahms_output to rmds01:/home/bramsink/mdc2/gbrahms_output/5.
    The 39 files (13515720288 B) were sunk 5 times in 10321 seconds giving a data rate of 6.24 MB/s. There were 5 errors during the transfer, two files needed two retries to succeed and one file needed one retry.
    The input file was sink_mdc2_5.in and the log files were sink_mdc2_5.log and sink_mdc2_5_database.txt.
  2. Feb 21-22 (11:45-02:24) on rsun00.rhic.bnl.gov using sink2.pl from rmds03:/diskA/brahms/mdc2/gbrahms_output to rmds01:/home/bramsink/mdc2/gbrahms_output/6 and rmds01:/home/bramsink/mdc2/gbrahms_output/7.
    1. The first attempt (log file sink_mdc2_6_database_1.txt) on sinking the 39 files took 1972 seconds giving a data rate of 6.5 MB/s.
    2. The second attempt (input file sink_mdc2_6.in, log files sink_mdc2_6.log and sink_mdc2_6_database_5.txt) crashed after a little more than 5 iterations when more than 10 tries were needed to transfer a file. 67578601440 B in 10340 s gave a data rate of 6.2 MB/s. There were 16 errors during the transfer. The numbers of retries before success were 5, 7, 1, 1 and 2.
    3. The third attempt (input file sink_mdc2_7.in, log files sink_mdc2_7.log and sink_mdc2_7_database.txt) successfully sunk all 39 files 10 times, 135157202880 B in 22165 s, giving a data rate of 5.8 MB/s. There were 12 errors. The numbers of retries were 3, 2, 2, 1, 1, 1 and 2.
    The data rates per file for sinking (black triangles pointing downwards) and the accumulated average rate (read triangles pointing downwards) are shown in this figure.
Konstantin Olchanski did some tests sinking data directly from a program using FTP and from a RAM disk using raw_transfer_pftp.pl.

Data reconstruction

Changes to BRAT:
  1. BrCombineTrack::CombineT3T4T5: Assignments to T4DetectorTracks and T5DetectorTracks moved in front of test for DebugLevel()>0 in order for the code to match tracks when debug printing is switched off.
    This change was committed into the repository March 1, 1999.
  2. BrLocalTrackDC::Event: The three Clear() at the end caused a memory leak and were removed. The segmentation fault described in the comment was avoided by linking with the ROOT New library (-lNew).
    The change has not been committed into the repository.
The simulation files 323, 325-357 and 362-365 were successfully digitized and reconstructed. However, HPSS problems crashed several jobs.

Summary of reconstruction jobs:
  1. Batch 0 (files in mdc2/(jsf,log,rdo)/0)
    Geant jobInput size [B]Last eventReal timeCPU time [s]Output size [B]Job script
    3278126966482000025:30:49 91659.52631948569crsjf_011
    32947809995220000 7:06:07 25487.31314619237crsjf_012
    3404323935281000041:20:11 148673.77414355312crsjf_013
    34429483102410000 8:25:46 30221.28299541596crsjf_014
    362147844784 330022:22:52 80471.33201642658crsjf_016
    364358506552 800070:06:47 251970.60509741741crsjf_018
    365357957424 800070:16:29 252736.86506345223crsjf_019
    CRS jobs 15 and 17 were killed at the end of MDC2
  2. Batch 1 (files in mdc2/(jsf,log,rdo)/1)
    Geant jobInput size [B]Last eventReal timeCPU time [s]Output size [B]Job script
    33028718763611700 4:29:44 16156.89189753781crsjf_020
    33149223232820000 7:23:24 26547.81329796382crsjf_021
    33247913980820000 7:11:21 25788.59316003946crsjf_022
    33322566440810100 1:20:04 4790.38129665944crsjf_023
    33444489042420000 3:28:14 12459.76258471506crsjf_024
    33524701435210000 5:49:40 20897.09162479291crsjf_025
    33624701435210000 5:27:01 19595.02162479940crsjf_026
    337171907256 6900 3:01:24 10870.73112539840crsjf_027
    33824428518410000 3:52:28 13914.60162559239crsjf_028
    33924414416810000 4:03:14 14539.75162573364crsjf_029
  3. Batch 2 (files in mdc2/(jsf,log,rdo)/2)
    Geant jobInput size [B]Last eventReal timeCPU time [s]Output size [B]Job script
    325115339264 1000 3:23:04 12153.75 78051922crsjf_065
    326641893076 600014:35:20 52367.34428530365crsjf_066
    3278126966482000025:41:38 92248.45631948677crsjf_030
    3284667796441350018:05:14 65036.43440055173crsjf_031
    32947809995220000 5:02:28 18108.72314620795crsjf_032
    33028718763611700 4:33:48 16374.49189753998crsjf_033
    3314922323282000010:02:28 36052.70329797669crsjf_034
    33247913980820000 5:06:53 18370.23316005628crsjf_035
    33322566440810100 1:20:16 4796.85129664847crsjf_036
    33444489042420000 3:29:35 12544.57258470829crsjf_037
    33524701435210000 5:45:27 20671.54162480195crsjf_038
    33624701435210000 5:46:37 20738.11162479443crsjf_039
    337171907256 6900 3:10:23 11384.35112539988crsjf_040
    33824428518410000 4:08:53 14877.20162559692crsjf_041
    33924414416810000 4:04:32 14607.71162572857crsjf_042
    3404323935281000043:50:37157441.66414355803crsjf_043
    3414290095921000043:29:09156246.17433597979crsjf_044
    3424293619201000044:43:25160635.13433304684crsjf_045
    3434325797521000043:32:21156394.72416439415crsjf_046
    34429483102410000 8:24:56 30224.66299542591crsjf_047
    34529776553210000 8:41:49 31232.15301146157crsjf_048
    34629119317610000 7:59:26 28685.13296403391crsjf_049
    347524861080 490024:52:38 89471.48379408014crsjf_050
    348189194240 7400 3:12:17 10461.08128718685crsjf_051
    349188481536 7400 3:25:43 12302.09127943778crsjf_052
    350193269760 7400 4:26:31 15943.69133853025crsjf_053
    35144103777616900 9:39:01 34661.63309752173crsjf_054
    352235995444 9100 5:38:17 20236.07163761532crsjf_055
    353236033676 9100 3:56:25 14140.90163305517crsjf_056
    354236194096 9100 5:51:46 21065.52164088200crsjf_057
    355235159952 9000 5:53:51 21201.36162815349crsjf_058
    356236577652 9100 5:40:31 20331.97164328518crsjf_059
    357230412140 8800 5:30:42 19771.35160477547crsjf_060
    362147844784 330028:53:12103806.35201643654crsjf_061
    363147559800 330022:40:52 80531.70205269215crsjf_062
    365357957424 800066:33:42239366.12506345735crsjf_064
    CRS job 63 was killed at the end of MDC2
  4. Batch 3 (files in mdc2/(jsf,log,rdo)/3)
    Geant jobInput size [B]Last eventReal timeCPU time [s]Output size [B]Job script
    32330087877610000 5:52:39 21034.19378919547crsjf_101
    325115339264 1000 3:22:23 12083.79 78052017crsjf_103
    326641893076 600019:36:07 70348.25428530156crsjf_104
    3278126966482000024:33:38 88257.14631951376crsjf_105
    3284667796441350013:52:24 49839.71440056997crsjf_106
    32947809995220000 5:08:29 18451.97314618419crsjf_107
    33028718763611700 4:37:23 16596.37189754628crsjf_108
    3314922323282000010:43:12 38442.23329795823crsjf_109
    33247913980820000 7:03:31 25293.01316004730crsjf_110
    33322566440810100 1:22:20 4862.52129666206crsjf_111
    33444489042420000 4:50:30 17340.30258471045crsjf_112
    33524701435210000 5:50:53 20990.90162479614crsjf_113
    33624701435210000 6:05:00 21822.17162480583crsjf_114
    337171907256 6900 3:09:54 11338.37112540285crsjf_115
    33824428518410000 3:55:14 14092.06162558648crsjf_116
    33924414416810000 3:01:57 10885.75162573400crsjf_117
    3424293619201000032:48:30117972.41433302190crsjf_120
    34429483102410000 8:42:54 31273.66299542787crsjf_122
    34529776553210000 8:50:10 31703.48301145672crsjf_123
    34629119317610000 8:31:02 30567.63296402262crsjf_124
    347524861080 490025:47:52 92658.01379409450crsjf_125
    348189194240 7400 4:07:41 14764.03128719154crsjf_126
    349188481536 7400 2:25:51 8742.15127943799crsjf_127
    350193269760 7400 4:29:11 16103.57133853221crsjf_128
    35144103777616900 9:57:00 35690.20309751530crsjf_129
    352235995444 9100 5:43:49 20587.29163762091crsjf_130
    353236033676 9100 5:21:12 19213.96163305447crsjf_131
    354236194096 9100 4:27:16 16003.00164087915crsjf_132
    355235159952 9000 5:56:46 21370.69162815594crsjf_133
    356236577652 9100 5:48:00 20798.27164329073crsjf_134
    357230412140 8800 5:28:14 19638.51160477218crsjf_135
    362147844784 330022:15:17 80032.10201643500crsjf_136
    363147559800 330029:27:03105956.77205270411crsjf_137
    CRS jobs 102, 118, 119, 138 and 139 were killed at the end of MDC2
The differences in CPU time for the same Geant job are caused by the different CPU speed of the CRS nodes (350MHz and 450MHz).

Some of the HPSS file attributes (file name, file size, tape ID and position) of the DST files are given in the file ~bramreco/mdc2/log/HPSS_file_attribs.txt.

CPU time (in seconds) used by the different BRAT modules for batch 3
Geant
job
Digitize Local Tracking
BBT3T4T5 TPC1TPC2 TPM1TPM2T3T4T5TPC2TPM1TPM2
32333 1 0 0 1 0 4619 56 1 0 0 313871 551
32511 8 3 2 2666 194 3 3 3358 34501491 475 8 8
326633814 914234 951 4 3217622324050902518 8 8
3275924 9 7 4105 328 9251 10728785129073000150824160 908
3283217 7 5 2543 264 5040 5815208 75733220 98412744 421
3294810 3 2 889 79 1 011496 3496 797 585 3 3
33033 9 4 3 724 90 0 0 9996 30011321 606 3 3
3316520 8 6 1413 184 1 124256 777819271224 6 6
3326515 5 3 1174 116 1 114272 62621107 812 6 5
33333 5 2 1 144 18 0 0 2779 762 363 192 3 2
33464 9 3 3 274 34 1 112177 2808 489 362 5 5
33529 8 3 3 651 77 0 014491 3626 901 504 2 2
33633 9 4 3 718 85 0 014806 3828 942 614 3 3
33723 7 3 2 489 63 0 0 5760 3131 961 371 2 2
3383210 4 3 745 87 0 0 7129 37681056 508 3 2
33923 7 2 2 546 67 0 0 5720 2880 706 387 1 2
34225 0 0 0 0 038613 190 0 0 0 276524 886
34432 0 0 0 1 0 8620 79 0 0 0 320433 589
34532 0 0 0 1 0 8615 81 0 0 0 320870 580
34631 0 0 0 1 0 8388 77 0 0 0 320007 569
34752 1 0 0 0 028249 197 8 8 8 10610371005
34824 8 3 2 792 81 0 0 8274 3808 654 495 3 2
34918 5 2 1 592 60 0 0 4972 2084 269 308 1 1
35024 9 4 3 925 113 0 0 7366 47521692 580 2 2
3515521 9 7 2071 256 1 017870 933133101297 5 4
3523012 5 4 1121 134 0 0 9694 61111989 706 3 3
3533012 5 3 1110 127 0 0 9604 54741414 662 3 3
35421 8 4 2 827 99 0 0 9201 3947 852 479 1 1
3552911 5 3 1109 126 1 010764 58392047 661 3 3
3563011 5 4 1130 131 0 011001 56861322 694 3 2
3572610 4 3 1003 121 0 0 9692 59591493 613 2 2
362 0 0 0 0 0 028059 84 9 9 9 1050723 418
363 1 0 0 0 0 037565 104 9 9 9 966669 549

Unsinking reconstructed data from HPSS

Reconstructed data was extracted from HPSS to the bramreco account using Tom Throwe's script dst_get_pftp.pl.
  1. Feb 20 on rcf.rhic.bnl.gov from rmds01:/home/bramreco/mdc2/rdo/0 and rmds01:/home/bramreco/mdc2/rdo/2 to rmds03:/disk30000/brahms/mdc2/dst.
    The 32 files (8.2 GB) were extracted in 8580 s giving a data rate of 0.95 MB/s.
  2. Feb 21-22 (14:00-03:10) on rsun00.rhic.bnl.gov using unsink2dst.pl from rmds01:/home/bramreco/mdc2/rdo/0, rmds01:/home/bramreco/mdc2/rdo/2 and rmds01:/home/bramreco/mdc2/rdo/3 to rsun00:/diskA/brahms/mdc2/dst.
    1. The first attempt (input file mdc2/unsink/unsink2dst_0.in, log files mdc2/unsink/unsink2dst_0.log and mdc2/unsink/unsink2dst_0_database.txt) transferred 1138530850 B (3 files) in 553 s giving a data rate of 1.96 MB/s. One of the files that were in the HPSS disk cache had a data rate of 7.6 MB/s.
    2. The second attempt (input file mdc2/unsink/unsink2dst_1.in, log files mdc2/unsink/unsink2dst_1.log and mdc2/unsink/unsink2dst_1_database.txt) transferred 21079161630 B (82 files) in 24014 s giving a data rate of 0.84 MB/s. The files were requested in the sequence they were written to tape.
    3. The third attempt (input file mdc2/unsink/unsink2dst_2.in, log files mdc2/unsink/unsink2dst_2.log and mdc2/unsink/unsink2dst_2_database.txt) transferred 10479186918 B (36 files) in 12089 s giving a data rate of 0.83 MB/s. The files were requested in the sequence they were written to tape.
    4. The fourth attempt (input file mdc2/unsink/unsink2dst_3.in, log files mdc2/unsink/unsink2dst_3.log and mdc2/unsink/unsink2dst_3_database.txt) transferred 2527805504 B (one file four times) in 253 s giving a data rate of 9.5 MB/s. This file was still in the HPSS disk cache.
    The data rates per file for unsinking (black triangles pointing upwards) and the accumulated average rate (read triangles pointing upwards) are shown in this figure.
Copying 10334693969 B (38 files) from rsun00:/diskA/brahms/mdc2/dst to rmds03:/disk30000/brahms/mdc2/dst took about 960 s giving a data rate of 10.4 MB/s.

The bramreco account didn't have the permission necessary to extract data from HPSS using the ORNL offline_submit command.

Data mining/analysis

The program Bminor3 (in ~alv/mdc2/brat_apps) reads a DST file, tries to match found tracks and Geant tracks and writes a ROOT tree containing some event information (run number, event number, Z of interaction vertex, time of interaction, number of tracks) and some track information (momentum, polar and azimuth angle, intercepts at vertex plane for reconstructed track and Geant tracks and the fraction of assigned hits belonging to the Geant track). The 38 Geant jobs (323, 325-357 and 362-365) were analyzed 5 times each by submitting 190 jobs to the brahms_cas LSF queue. Four nodes were available: rcas0001, 0035, 0036 and 0135. The jobs were submitted Feb 22 at 4:36am and the last job finished at 8:05am (about 12600 s). 51.5 GB was read from DSTs (from /diskA/brahms/mdc2/dst) and 26.5 MB was written to uDSTs (to /diskA/brahms/mdc2/udst). Average input rate was 4 MB/s.

Web pages accessing the RCF CRS databases

Four web pages were written to access information from the CRS databases. They are based on pages written for PHENIX. They require a Roxen Challenger server and Dave Morrison kindly gave us permission to use the PHENIX server during MDC2, but the pages are no longer accessible.
Alv Kjetil Holme
Created 2 March 1999. Modified 9 December 1999.