Summary of Mock Data Challenge 2
Sinking raw data into HPSS
Raw data was sunk into HPSS
from the bramsink account
using Tom Throwe's script raw_transfer_pftp.pl.
- Feb 20 (16:50-22:36)
on rcf.rhic.bnl.gov
using sink.pl
from rmds03:/disk30000/brahms/gbrahms_output
to rmds01:/home/bramsink/mdc2/gbrahms_output/5.
The 39 files (13515720288 B) were sunk 5 times in 10321 seconds
giving a data rate of 6.24 MB/s.
There were 5 errors during the transfer, two files needed two retries
to succeed and one file needed one retry.
The input file was sink_mdc2_5.in and
the log files were sink_mdc2_5.log and
sink_mdc2_5_database.txt.
- Feb 21-22 (11:45-02:24)
on rsun00.rhic.bnl.gov
using sink2.pl
from rmds03:/diskA/brahms/mdc2/gbrahms_output
to rmds01:/home/bramsink/mdc2/gbrahms_output/6
and rmds01:/home/bramsink/mdc2/gbrahms_output/7.
- The first attempt (log file sink_mdc2_6_database_1.txt)
on sinking the 39 files took 1972 seconds
giving a data rate of 6.5 MB/s.
- The second attempt
(input file sink_mdc2_6.in,
log files sink_mdc2_6.log and
sink_mdc2_6_database_5.txt)
crashed after a little more than 5 iterations
when more than 10 tries were needed to transfer a file.
67578601440 B in 10340 s gave a data rate of 6.2 MB/s.
There were 16 errors during the transfer.
The numbers of retries before success were 5, 7, 1, 1 and 2.
- The third attempt
(input file sink_mdc2_7.in,
log files sink_mdc2_7.log and
sink_mdc2_7_database.txt)
successfully sunk all 39 files 10 times,
135157202880 B in 22165 s, giving a data rate of 5.8 MB/s.
There were 12 errors.
The numbers of retries were 3, 2, 2, 1, 1, 1 and 2.
The data rates per file for sinking (black triangles pointing downwards)
and the accumulated average rate (read triangles pointing downwards)
are shown in this figure.
Konstantin Olchanski did some tests sinking data directly from a program
using FTP and from a RAM disk using raw_transfer_pftp.pl.
Data reconstruction
Changes to BRAT:
- BrCombineTrack::CombineT3T4T5:
Assignments to T4DetectorTracks and T5DetectorTracks moved in front of
test for DebugLevel()>0 in order for the code to match tracks
when debug printing is switched off.
This change was committed into the repository March 1, 1999.
- BrLocalTrackDC::Event:
The three Clear() at the end caused a memory leak and were
removed.
The segmentation fault described in the comment was
avoided by linking with the ROOT New library (-lNew).
The change has not been committed into the repository.
The simulation files 323, 325-357 and 362-365 were successfully digitized
and reconstructed. However, HPSS problems crashed several jobs.
Summary of reconstruction jobs:
- Batch 0 (files in mdc2/(jsf,log,rdo)/0)
Geant job | Input size [B] | Last event | Real time | CPU time [s] | Output size [B] | Job script
|
327 | 812696648 | 20000 | 25:30:49 | 91659.52 | 631948569 | crsjf_011
|
329 | 478099952 | 20000 | 7:06:07 | 25487.31 | 314619237 | crsjf_012
|
340 | 432393528 | 10000 | 41:20:11 | 148673.77 | 414355312 | crsjf_013
|
344 | 294831024 | 10000 | 8:25:46 | 30221.28 | 299541596 | crsjf_014
|
362 | 147844784 | 3300 | 22:22:52 | 80471.33 | 201642658 | crsjf_016
|
364 | 358506552 | 8000 | 70:06:47 | 251970.60 | 509741741 | crsjf_018
|
365 | 357957424 | 8000 | 70:16:29 | 252736.86 | 506345223 | crsjf_019
|
CRS jobs 15 and 17 were killed at the end of MDC2
- Batch 1 (files in mdc2/(jsf,log,rdo)/1)
Geant job | Input size [B] | Last event | Real time | CPU time [s] | Output size [B] | Job script
|
330 | 287187636 | 11700 | 4:29:44 | 16156.89 | 189753781 | crsjf_020
|
331 | 492232328 | 20000 | 7:23:24 | 26547.81 | 329796382 | crsjf_021
|
332 | 479139808 | 20000 | 7:11:21 | 25788.59 | 316003946 | crsjf_022
|
333 | 225664408 | 10100 | 1:20:04 | 4790.38 | 129665944 | crsjf_023
|
334 | 444890424 | 20000 | 3:28:14 | 12459.76 | 258471506 | crsjf_024
|
335 | 247014352 | 10000 | 5:49:40 | 20897.09 | 162479291 | crsjf_025
|
336 | 247014352 | 10000 | 5:27:01 | 19595.02 | 162479940 | crsjf_026
|
337 | 171907256 | 6900 | 3:01:24 | 10870.73 | 112539840 | crsjf_027
|
338 | 244285184 | 10000 | 3:52:28 | 13914.60 | 162559239 | crsjf_028
|
339 | 244144168 | 10000 | 4:03:14 | 14539.75 | 162573364 | crsjf_029
|
- Batch 2 (files in mdc2/(jsf,log,rdo)/2)
Geant job | Input size [B] | Last event | Real time | CPU time [s] | Output size [B] | Job script
|
325 | 115339264 | 1000 | 3:23:04 | 12153.75 | 78051922 | crsjf_065
|
326 | 641893076 | 6000 | 14:35:20 | 52367.34 | 428530365 | crsjf_066
|
327 | 812696648 | 20000 | 25:41:38 | 92248.45 | 631948677 | crsjf_030
|
328 | 466779644 | 13500 | 18:05:14 | 65036.43 | 440055173 | crsjf_031
|
329 | 478099952 | 20000 | 5:02:28 | 18108.72 | 314620795 | crsjf_032
|
330 | 287187636 | 11700 | 4:33:48 | 16374.49 | 189753998 | crsjf_033
|
331 | 492232328 | 20000 | 10:02:28 | 36052.70 | 329797669 | crsjf_034
|
332 | 479139808 | 20000 | 5:06:53 | 18370.23 | 316005628 | crsjf_035
|
333 | 225664408 | 10100 | 1:20:16 | 4796.85 | 129664847 | crsjf_036
|
334 | 444890424 | 20000 | 3:29:35 | 12544.57 | 258470829 | crsjf_037
|
335 | 247014352 | 10000 | 5:45:27 | 20671.54 | 162480195 | crsjf_038
|
336 | 247014352 | 10000 | 5:46:37 | 20738.11 | 162479443 | crsjf_039
|
337 | 171907256 | 6900 | 3:10:23 | 11384.35 | 112539988 | crsjf_040
|
338 | 244285184 | 10000 | 4:08:53 | 14877.20 | 162559692 | crsjf_041
|
339 | 244144168 | 10000 | 4:04:32 | 14607.71 | 162572857 | crsjf_042
|
340 | 432393528 | 10000 | 43:50:37 | 157441.66 | 414355803 | crsjf_043
|
341 | 429009592 | 10000 | 43:29:09 | 156246.17 | 433597979 | crsjf_044
|
342 | 429361920 | 10000 | 44:43:25 | 160635.13 | 433304684 | crsjf_045
|
343 | 432579752 | 10000 | 43:32:21 | 156394.72 | 416439415 | crsjf_046
|
344 | 294831024 | 10000 | 8:24:56 | 30224.66 | 299542591 | crsjf_047
|
345 | 297765532 | 10000 | 8:41:49 | 31232.15 | 301146157 | crsjf_048
|
346 | 291193176 | 10000 | 7:59:26 | 28685.13 | 296403391 | crsjf_049
|
347 | 524861080 | 4900 | 24:52:38 | 89471.48 | 379408014 | crsjf_050
|
348 | 189194240 | 7400 | 3:12:17 | 10461.08 | 128718685 | crsjf_051
|
349 | 188481536 | 7400 | 3:25:43 | 12302.09 | 127943778 | crsjf_052
|
350 | 193269760 | 7400 | 4:26:31 | 15943.69 | 133853025 | crsjf_053
|
351 | 441037776 | 16900 | 9:39:01 | 34661.63 | 309752173 | crsjf_054
|
352 | 235995444 | 9100 | 5:38:17 | 20236.07 | 163761532 | crsjf_055
|
353 | 236033676 | 9100 | 3:56:25 | 14140.90 | 163305517 | crsjf_056
|
354 | 236194096 | 9100 | 5:51:46 | 21065.52 | 164088200 | crsjf_057
|
355 | 235159952 | 9000 | 5:53:51 | 21201.36 | 162815349 | crsjf_058
|
356 | 236577652 | 9100 | 5:40:31 | 20331.97 | 164328518 | crsjf_059
|
357 | 230412140 | 8800 | 5:30:42 | 19771.35 | 160477547 | crsjf_060
|
362 | 147844784 | 3300 | 28:53:12 | 103806.35 | 201643654 | crsjf_061
|
363 | 147559800 | 3300 | 22:40:52 | 80531.70 | 205269215 | crsjf_062
|
365 | 357957424 | 8000 | 66:33:42 | 239366.12 | 506345735 | crsjf_064
|
CRS job 63 was killed at the end of MDC2
- Batch 3 (files in mdc2/(jsf,log,rdo)/3)
Geant job | Input size [B] | Last event | Real time | CPU time [s] | Output size [B] | Job script
|
323 | 300878776 | 10000 | 5:52:39 | 21034.19 | 378919547 | crsjf_101
|
325 | 115339264 | 1000 | 3:22:23 | 12083.79 | 78052017 | crsjf_103
|
326 | 641893076 | 6000 | 19:36:07 | 70348.25 | 428530156 | crsjf_104
|
327 | 812696648 | 20000 | 24:33:38 | 88257.14 | 631951376 | crsjf_105
|
328 | 466779644 | 13500 | 13:52:24 | 49839.71 | 440056997 | crsjf_106
|
329 | 478099952 | 20000 | 5:08:29 | 18451.97 | 314618419 | crsjf_107
|
330 | 287187636 | 11700 | 4:37:23 | 16596.37 | 189754628 | crsjf_108
|
331 | 492232328 | 20000 | 10:43:12 | 38442.23 | 329795823 | crsjf_109
|
332 | 479139808 | 20000 | 7:03:31 | 25293.01 | 316004730 | crsjf_110
|
333 | 225664408 | 10100 | 1:22:20 | 4862.52 | 129666206 | crsjf_111
|
334 | 444890424 | 20000 | 4:50:30 | 17340.30 | 258471045 | crsjf_112
|
335 | 247014352 | 10000 | 5:50:53 | 20990.90 | 162479614 | crsjf_113
|
336 | 247014352 | 10000 | 6:05:00 | 21822.17 | 162480583 | crsjf_114
|
337 | 171907256 | 6900 | 3:09:54 | 11338.37 | 112540285 | crsjf_115
|
338 | 244285184 | 10000 | 3:55:14 | 14092.06 | 162558648 | crsjf_116
|
339 | 244144168 | 10000 | 3:01:57 | 10885.75 | 162573400 | crsjf_117
|
342 | 429361920 | 10000 | 32:48:30 | 117972.41 | 433302190 | crsjf_120
|
344 | 294831024 | 10000 | 8:42:54 | 31273.66 | 299542787 | crsjf_122
|
345 | 297765532 | 10000 | 8:50:10 | 31703.48 | 301145672 | crsjf_123
|
346 | 291193176 | 10000 | 8:31:02 | 30567.63 | 296402262 | crsjf_124
|
347 | 524861080 | 4900 | 25:47:52 | 92658.01 | 379409450 | crsjf_125
|
348 | 189194240 | 7400 | 4:07:41 | 14764.03 | 128719154 | crsjf_126
|
349 | 188481536 | 7400 | 2:25:51 | 8742.15 | 127943799 | crsjf_127
|
350 | 193269760 | 7400 | 4:29:11 | 16103.57 | 133853221 | crsjf_128
|
351 | 441037776 | 16900 | 9:57:00 | 35690.20 | 309751530 | crsjf_129
|
352 | 235995444 | 9100 | 5:43:49 | 20587.29 | 163762091 | crsjf_130
|
353 | 236033676 | 9100 | 5:21:12 | 19213.96 | 163305447 | crsjf_131
|
354 | 236194096 | 9100 | 4:27:16 | 16003.00 | 164087915 | crsjf_132
|
355 | 235159952 | 9000 | 5:56:46 | 21370.69 | 162815594 | crsjf_133
|
356 | 236577652 | 9100 | 5:48:00 | 20798.27 | 164329073 | crsjf_134
|
357 | 230412140 | 8800 | 5:28:14 | 19638.51 | 160477218 | crsjf_135
|
362 | 147844784 | 3300 | 22:15:17 | 80032.10 | 201643500 | crsjf_136
|
363 | 147559800 | 3300 | 29:27:03 | 105956.77 | 205270411 | crsjf_137
|
CRS jobs 102, 118, 119, 138 and 139 were killed at the end of MDC2
The differences in CPU time for the same Geant job are caused
by the different CPU speed of the CRS nodes (350MHz and 450MHz).
Some of the HPSS file attributes (file name, file size, tape ID and position)
of the DST files are given in the file
~bramreco/mdc2/log/HPSS_file_attribs.txt.
CPU time (in seconds) used by the different BRAT modules for batch 3
Geant job | Digitize | Local Tracking
|
---|
BB | T3 | T4 | T5 | TPC1 | TPC2 | TPM1 | TPM2 | T3 | T4 | T5 | TPC2 | TPM1 | TPM2
|
323 | 33 | 1 | 0 | 0 | 1 | 0 | 4619 | 56 | 1 | 0 | 0 | 3 | 13871 | 551
|
325 | 11 | 8 | 3 | 2 | 2666 | 194 | 3 | 3 | 3358 | 3450 | 1491 | 475 | 8 | 8
|
326 | 63 | 38 | 14 | 9 | 14234 | 951 | 4 | 3 | 21762 | 23240 | 5090 | 2518 | 8 | 8
|
327 | 59 | 24 | 9 | 7 | 4105 | 328 | 9251 | 107 | 28785 | 12907 | 3000 | 1508 | 24160 | 908
|
328 | 32 | 17 | 7 | 5 | 2543 | 264 | 5040 | 58 | 15208 | 7573 | 3220 | 984 | 12744 | 421
|
329 | 48 | 10 | 3 | 2 | 889 | 79 | 1 | 0 | 11496 | 3496 | 797 | 585 | 3 | 3
|
330 | 33 | 9 | 4 | 3 | 724 | 90 | 0 | 0 | 9996 | 3001 | 1321 | 606 | 3 | 3
|
331 | 65 | 20 | 8 | 6 | 1413 | 184 | 1 | 1 | 24256 | 7778 | 1927 | 1224 | 6 | 6
|
332 | 65 | 15 | 5 | 3 | 1174 | 116 | 1 | 1 | 14272 | 6262 | 1107 | 812 | 6 | 5
|
333 | 33 | 5 | 2 | 1 | 144 | 18 | 0 | 0 | 2779 | 762 | 363 | 192 | 3 | 2
|
334 | 64 | 9 | 3 | 3 | 274 | 34 | 1 | 1 | 12177 | 2808 | 489 | 362 | 5 | 5
|
335 | 29 | 8 | 3 | 3 | 651 | 77 | 0 | 0 | 14491 | 3626 | 901 | 504 | 2 | 2
|
336 | 33 | 9 | 4 | 3 | 718 | 85 | 0 | 0 | 14806 | 3828 | 942 | 614 | 3 | 3
|
337 | 23 | 7 | 3 | 2 | 489 | 63 | 0 | 0 | 5760 | 3131 | 961 | 371 | 2 | 2
|
338 | 32 | 10 | 4 | 3 | 745 | 87 | 0 | 0 | 7129 | 3768 | 1056 | 508 | 3 | 2
|
339 | 23 | 7 | 2 | 2 | 546 | 67 | 0 | 0 | 5720 | 2880 | 706 | 387 | 1 | 2
|
342 | 25 | 0 | 0 | 0 | 0 | 0 | 38613 | 190 | 0 | 0 | 0 | 2 | 76524 | 886
|
344 | 32 | 0 | 0 | 0 | 1 | 0 | 8620 | 79 | 0 | 0 | 0 | 3 | 20433 | 589
|
345 | 32 | 0 | 0 | 0 | 1 | 0 | 8615 | 81 | 0 | 0 | 0 | 3 | 20870 | 580
|
346 | 31 | 0 | 0 | 0 | 1 | 0 | 8388 | 77 | 0 | 0 | 0 | 3 | 20007 | 569
|
347 | 52 | 1 | 0 | 0 | 0 | 0 | 28249 | 197 | 8 | 8 | 8 | 10 | 61037 | 1005
|
348 | 24 | 8 | 3 | 2 | 792 | 81 | 0 | 0 | 8274 | 3808 | 654 | 495 | 3 | 2
|
349 | 18 | 5 | 2 | 1 | 592 | 60 | 0 | 0 | 4972 | 2084 | 269 | 308 | 1 | 1
|
350 | 24 | 9 | 4 | 3 | 925 | 113 | 0 | 0 | 7366 | 4752 | 1692 | 580 | 2 | 2
|
351 | 55 | 21 | 9 | 7 | 2071 | 256 | 1 | 0 | 17870 | 9331 | 3310 | 1297 | 5 | 4
|
352 | 30 | 12 | 5 | 4 | 1121 | 134 | 0 | 0 | 9694 | 6111 | 1989 | 706 | 3 | 3
|
353 | 30 | 12 | 5 | 3 | 1110 | 127 | 0 | 0 | 9604 | 5474 | 1414 | 662 | 3 | 3
|
354 | 21 | 8 | 4 | 2 | 827 | 99 | 0 | 0 | 9201 | 3947 | 852 | 479 | 1 | 1
|
355 | 29 | 11 | 5 | 3 | 1109 | 126 | 1 | 0 | 10764 | 5839 | 2047 | 661 | 3 | 3
|
356 | 30 | 11 | 5 | 4 | 1130 | 131 | 0 | 0 | 11001 | 5686 | 1322 | 694 | 3 | 2
|
357 | 26 | 10 | 4 | 3 | 1003 | 121 | 0 | 0 | 9692 | 5959 | 1493 | 613 | 2 | 2
|
362 | 0 | 0 | 0 | 0 | 0 | 0 | 28059 | 84 | 9 | 9 | 9 | 10 | 50723 | 418
|
363 | 1 | 0 | 0 | 0 | 0 | 0 | 37565 | 104 | 9 | 9 | 9 | 9 | 66669 | 549
|
Unsinking reconstructed data from HPSS
Reconstructed data was extracted from HPSS
to the bramreco account
using Tom Throwe's script dst_get_pftp.pl.
- Feb 20
on rcf.rhic.bnl.gov
from rmds01:/home/bramreco/mdc2/rdo/0
and rmds01:/home/bramreco/mdc2/rdo/2
to rmds03:/disk30000/brahms/mdc2/dst.
The 32 files (8.2 GB) were extracted in 8580 s
giving a data rate of 0.95 MB/s.
- Feb 21-22 (14:00-03:10)
on rsun00.rhic.bnl.gov
using unsink2dst.pl
from rmds01:/home/bramreco/mdc2/rdo/0,
rmds01:/home/bramreco/mdc2/rdo/2
and rmds01:/home/bramreco/mdc2/rdo/3
to rsun00:/diskA/brahms/mdc2/dst.
- The first attempt
(input file mdc2/unsink/unsink2dst_0.in,
log files mdc2/unsink/unsink2dst_0.log and
mdc2/unsink/unsink2dst_0_database.txt)
transferred 1138530850 B (3 files) in 553 s
giving a data rate of 1.96 MB/s.
One of the files that were in the HPSS disk cache
had a data rate of 7.6 MB/s.
- The second attempt
(input file mdc2/unsink/unsink2dst_1.in,
log files mdc2/unsink/unsink2dst_1.log and
mdc2/unsink/unsink2dst_1_database.txt)
transferred 21079161630 B (82 files) in 24014 s
giving a data rate of 0.84 MB/s.
The files were requested in the sequence they were written to tape.
- The third attempt
(input file mdc2/unsink/unsink2dst_2.in,
log files mdc2/unsink/unsink2dst_2.log and
mdc2/unsink/unsink2dst_2_database.txt)
transferred 10479186918 B (36 files) in 12089 s
giving a data rate of 0.83 MB/s.
The files were requested in the sequence they were written to tape.
- The fourth attempt
(input file mdc2/unsink/unsink2dst_3.in,
log files mdc2/unsink/unsink2dst_3.log and
mdc2/unsink/unsink2dst_3_database.txt)
transferred 2527805504 B (one file four times) in 253 s
giving a data rate of 9.5 MB/s.
This file was still in the HPSS disk cache.
The data rates per file for unsinking (black triangles pointing upwards)
and the accumulated average rate (read triangles pointing upwards)
are shown in this figure.
Copying 10334693969 B (38 files)
from rsun00:/diskA/brahms/mdc2/dst
to rmds03:/disk30000/brahms/mdc2/dst
took about 960 s giving a data rate of 10.4 MB/s.
The bramreco account didn't have the permission necessary to
extract data from HPSS using the ORNL offline_submit command.
Data mining/analysis
The program Bminor3 (in ~alv/mdc2/brat_apps)
reads a DST file,
tries to match found tracks and Geant tracks and
writes a ROOT tree
containing
some event information
(run number,
event number,
Z of interaction vertex,
time of interaction,
number of tracks) and
some track information
(momentum,
polar and azimuth angle,
intercepts at vertex plane for reconstructed track and Geant tracks and
the fraction of assigned hits belonging to the Geant track).
The 38 Geant jobs (323, 325-357 and 362-365) were analyzed 5 times each by
submitting 190 jobs to the brahms_cas LSF queue.
Four nodes were available: rcas0001, 0035, 0036
and 0135.
The jobs were submitted Feb 22 at 4:36am and the last job finished at 8:05am
(about 12600 s).
51.5 GB was read from DSTs (from /diskA/brahms/mdc2/dst) and
26.5 MB was written to uDSTs (to /diskA/brahms/mdc2/udst).
Average input rate was 4 MB/s.
Web pages accessing the RCF CRS databases
Four web pages were written to access information from the CRS databases.
They are based on pages written for
PHENIX.
They require a Roxen Challenger server
and Dave Morrison kindly gave us permission to use the PHENIX server during
MDC2, but the pages are no longer accessible.
Alv Kjetil Holme
Created 2 March 1999.
Modified 9 December 1999.