HOWTO archive a large directory of files to the tape library

From GlueXWiki
Jump to: navigation, search

Backups of disk directories at JLab can be created with a script in the hd_utilities repository. The script creates a multi-file tar archive of a specified directory on the write-through cache. The resulting large files will automatically get archived to tape. By tar'ing up all of the files under the original directory, small files in the directory tree will also get archived and, when restored, appear in their original relative location.

The script is

 $HD_UTILITIES_HOME/tar_multi/disk_to_tape_backup.sh

It takes three positional arguments:

source_dir=$1 # directory name to be archived with full path
tar_multi=$2 # script to guide tar multi-volume archive creation and extraction, with full path
size=$3 # maximum size of each tar file volume (suffix: G, M, or k)

and creates a multi-volume tar archive on the write-through cache disk in a new directory. The directory created under cache is

 /cache/halld/home/backups/$source_dir

where $source_dir is the directory that was used as the first argument to the script (see above). In addition to the multi-volume tar archive, three other files are created in this directory:

  1. $tar_multi: the script used to guide tar (basename only)
  2. README: instructions for how to extract the tar archive
  3. MANIFEST: a listing of the archive files and a list of the files within each archive file

For example, the command:

$HD_UTILITIES_HOME/tar_multi/disk_to_tape_backup.sh \
  /work/halld/home/mpatsyuk/dirc/TImap1 \
  $HD_UTILITIES_HOME/tar_multi/tar_multi_3.sh \
  20G

results in the directory

/cache/halld/home/backups/work/halld/home/mpatsyuk/dirc/TImap1

In that directory the README says:

Tue May 21 11:19:36 EDT 2019
To restore files:
tar xvf /cache/halld/home/backups/work/halld/home/mpatsyuk/dirc/TImap1/TImap1.tar -F /cache/halld/home/backups/work/halld/home/mpatsyuk/dirc/TImap1/tar_multi_3.sh --multi-volume

The MANIFEST says:

Tue May 21 11:19:36 EDT 2019
/cache/halld/home/backups/work/halld/home/mpatsyuk/dirc/TImap1
total 4192164616
-rw-rw-r-- 1 gluex halld-2          92 May 21 11:19 MANIFEST
-rw-rw-r-- 1 gluex halld-2         225 May 21 11:19 README
-rwxrwxr-x 1 gluex halld-2         636 May 20 14:19 tar_multi_3.sh
-rw-rw-r-- 1 gluex halld-2 21474836480 May 20 14:25 TImap1.tar
-rw-rw-r-- 1 gluex halld-2 21474836480 May 20 15:23 TImap1.tar:10
-rw-rw-r-- 1 gluex halld-2 21474836480 May 21 01:10 TImap1.tar:100
-rw-rw-r-- 1 gluex halld-2 21474836480 May 21 01:16 TImap1.tar:101
...
-rw-rw-r-- 1 gluex halld-2 21474836480 May 21 00:59 TImap1.tar:98
-rw-rw-r-- 1 gluex halld-2 21474836480 May 21 01:04 TImap1.tar:99
tar file contents:
drwxr-sr-x mpatsyuk/halld-2  0 2018-10-16 17:34 TImap1/
-rw-r--r-- mpatsyuk/halld-2 6943 2018-10-09 22:46 TImap1/pdf_x-69.0_y-53.0_th6.25786_phi-146.76.root
-rw-r--r-- mpatsyuk/halld-2 2119303361 2018-10-10 02:27 TImap1/kapi_x-61.0_y-45.0_th5.04162_phi-149.508.root
-rw-r--r-- mpatsyuk/halld-2       6935 2018-10-10 21:09 TImap1/pdf_x-25.0_y57.0_th8.41465_phi92.678.root
...
-rw-r--r-- mpatsyuk/halld-2 1719255521 2018-10-09 12:34 TImap1/kapi_x-93.0_y-95.0_th10.9009_phi-134.664.root
Preparing volume 2 of /cache/halld/home/backups/work/halld/home/mpatsyuk/dirc/TImap1/TImap1.tar.
-rw-r--r-- mpatsyuk/halld-2       6939 2018-10-09 13:32 TImap1/pdf_x-93.0_y55.0_th10.9527_phi132.984.root
...

The tar archive itself is the set of *.tar* files listed in the MANIFEST and resident in the results directory.

Final note: the script only handles making the backup on the write-through cache. The original files (those being backed up) are left untouched.