Raw data sinking

The step-by-step procedure

The step by step procedure for raw data sinking into the new Managed Data Server is:
  1. Log in as user bramsink on rmds03.rhic.bnl.gov.
  2. Create (or edit) a job specification file (see sink.in) that describes the files you want to transfer.
  3. Run the sink.pl script.

The job specification file

The job specification file is a plain text file. It consists of lines of the following format:
   inputdir=default input directory
   outputdir=default output directory
   throttle=average transfer rate (in bytes/second)
   inputfile [outputfile]
  
The lines are handled sequentially. Comments start with a hash mark (#). Commands and file names can be mixed in any order. File names can be relative or absolute. File names and each component of the directory names should have no more than 255 characters. The valid chacaters in file and directory names are: the lower case letters (a-z), the digits (0-9), the underscore (_) and the period (.). If no name is given for output file, the name of the input file is used. Outputdir should not be an absolute path name.

An example of a job specification file used for tests from rmds03:

   inputdir=/disk0/brahms/gbrahms_output
   outputdir=mdc1/gbrahms_output
   throttle=2000000 #2MB/s
   sim_164.cdat
   sim_165.cdat
  
There are 29 GBRAHMS output files stored in the directory /disk0/brahms/gbrahms_output. This disk is mounted on rmds03 and rcas17. The job description file sink.in in the bramsink home directory was used to sink these files (several times).

The data sinking script

The data sinking script is invoked like this:
   sink.pl [switches] [--]  file
  
where the switches and arguments are:
    -c num   number of copies to make
    -d       detailed output for debugging
    -p       toggle PFTP for transfer (default is ON)
    -t       toggle throttled transfer (default is ON at 1MB/s)
    -u       use GMT for timestamp
    file     name of job specification file
  
If the copy switch is specified all files are transferred once and given their usual names, during the subsequent transfers an extra extension, .001, .002, ... is appended to the output file name.

A summary of the file transfers is written to the file sink_database.txt.

A poor man's throttle has been implemented to avoid filling the disk cache on the new Managed Data Server. The script transfers a file unthrottled by (P)FTP and then sleeps until the average rate is reached.

The files are transferred by the scripts raw_transfer.pl or raw_transfer_pftp.pl written by Tom Throwe.


Alv Kjetil Holme
Modified 29 September 1998