Raid-to-Silo Transfer Strategy

From GlueXWiki
Revision as of 16:26, 24 October 2013 by Wolin (Talk | contribs) (Created page with "Below is a proposal for a raid-to-silo transfer strategy for moving Hall D data files from our local raid server to the JLab tape storage facility. We will update this as our id...")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Below is a proposal for a raid-to-silo transfer strategy for moving Hall D data files from our local raid server to the JLab tape storage facility. We will update this as our ideas develop.

Elliott Wolin
Dave Lawrence
24-Oct-2013


Notes

  • We will use the jmirror facility from the Computer Center to transfer the files.
  • jmirror deletes the link to the file when the transfer is complete. It does not delete directories, only files.
  • jmirror is fairly smart and reliable. It only deletes the hard link when the file is safely transferred.
  • jmirror is run periodically via a cron job, it is not a tranfer server system. It transfers files it finds when it is run.
  • jmirror will not transfer files actively being written to, nor transfer files twice if invoked twice.
  • Additional hard links to the data file are untouched by jmirror. These can be used to keep the file on disk after transfer.
  • If files are kept they must be deleted in time to make room for new DAQ files. This will require cleanup strategy and cron scripts to implement it.
  • The DAQ creates a 10 GB file every 30 secs, about 1 TB/hour. Thus a two hour run generates 2 TB.
  • It is preferable to transfer files as they are ready for transfer, and not wait for the run to end before initiating transfer.
  • The simplest way to implement immediate transfer is for run control to run a script every time the ER closes a file.
  • Vardan and Carl are working out a simple scheme to allow users to specify such a script and have it run when a file is closed.
  • Mark I prefers to store files by "run period" with a simple naming scheme (RunPeriod001, RunPeriod002 or similar).
  • Run periods are just date ranges. Run numbers will NOT be reused, i.e. all run numbers are unique across all run periods.
  • Due to constraints in the mss a second level of directories is needed. Mark and I propose simply organizing files by run, e.g. something like Run000001, Run000002, etc.
  • Run files will have the run number in them, e.g something like: Run000001.evio.001, Run000001.evio.002, etc.
  • A two-hour run will generate around 250 files.
  • RAID disk partitions do not seem to be needed (see below), they can be implemented later if necessary.


Notes for Dec 2013 Online Data Challenge

  • We plan to use a basic autmomated file transfer mechanism in Dec that deletes files on transfer. If someone has the time we'll try just-in-time deletion.


Proposal