What are hsmget and hsmput?
As part of the FRE Overhaul we described a three-level storage model
that allow data coming from archive to be cached in an area called
ptmp (long-term shared scratch) when being copied to vftmp (fast
scratch) at runtime.
hsmget and hsmput are scripts that achieve this. Scripts are loosely
based on Tim Yeager's proposal.
What hsmget and hsmput do
hsmget and hsmput assume three levels of storage:
- an archive, possibly remote, slow connection, linear media,
permanent storage. This is
/archive on the HPCS. - a long-term shared scratch, accessible across a cluster of
supercomputing nodes such as the HPCS. Here we have two filesystems
/ptmp and /work that serve this purpose. You could also use
/vftmp/$user as a long-term scratch between jobs on the same node.
Data in long-term scratch is not guaranteed to stay forever:
anything requiring permanent storage must be archived. - a fast scratch, not guaranteed continuity beyond a single batch
job. This is the
$TMPDIR created on /vftmp for a qlogin/qsub
session.
On each of these we define root directories, below which the directory
tree is identical for the files being retrieved. That is to say, given
variables called $ARCHROOT (e.g "/archive/$USER") , $PTMPROOT (e.g
"/ptmp/$USER") and $WORKROOT (e.g "$TMPDIR"), a request to retrieve
$WORKROOT/foo/bar will look for a newer file called $PTMPROOT/foo/bar,
which in turn will look for a newer file called $ARCHROOT/foo/bar.
The remote file might also be in a tar or cpio container, i.e
$ARCHROOT/foo.cpio which contains a file called bar. This is done to
reduce the number of individual files in archive.
Note that transfers are only initiated when the source file is newer:
the underlying code is actually a Makefile.
hsmget Syntax:
/home/vb/hsm/bin/hsmget retrieves files from remote storage.
Usage: /home/vb/hsm/bin/hsmget [options] file [file...]
Options:
-a|--archroot anchor point on remote storage
-p|--ptmproot anchor point on long-term scratch
-w|--workroot anchor point on local fast scratch
-d|--deepget deep storage retreive command
-r|--remotecp remote copy command
-c|--clustercp fast network copy command
-l|--localcp local filesystem copy command (usually ln)
-t|--time turn on timers
-f|--force force transfer even if local file up-to-date
-n|--nocopy dry run, no actual data transfer
-m|--makefile makefile that this invokes
-v|--verbose verbose messages.
-h|--help print help message.
Arguments must be files.
Container directory may be a tar/cpio archive on remote storage.
hsmput syntax
/home/vb/hsm/bin/hsmput puts files to remote storage.
Usage: /home/vb/hsm/bin/hsmput [options] path [path...]
Options:
-a|--archroot anchor point on remote storage
-p|--ptmproot anchor point on long-term scratch
-w|--workroot anchor point on local fast scratch
-d|--deepput deep storage command
-r|--remotecp remote copy command
-c|--clustercp fast network copy command
-l|--localcp local filesystem copy command (usually ln)
-s|--store remote store type (cpio, tar or directory)
-t|--time turn on timers
-f|--force force transfer even if local file up-to-date
-n|--nocopy dry run, no actual data transfer
-m|--makefile makefile that this invokes
-v|--verbose verbose messages.
-h|--help print help message.
Arguments must be files or directories.
Container directory may be a tar/cpio archive on remote storage.
Notes
- Note that
hsmget arguments must be files. This is to avoid the risk
of excessive transfers, filling disks and so on, by inadvertently
retrieving large file trees. However you can hsmput an entire
directory... this is less risky. hsmput only puts data to ptmp unless you explicitly request storing
to archive using hsmput -s cpio or equivalent.- To repeat, the underlying code uses
make: transfers in either
direction are only initiated when the target is out of date. - The idea is that this can be generalized to having the archive and
compute not being collocated. i.e this can easily be generalized to
a situation where the archive is remote and compute local; or vice versa
Download
Currently available as /home/vb/hsm/bin/hsmget and
/home/vb/hsm/bin/hsmput. Users of FREv3 can check it out into their
bin directory using
cvs co -r omsk_vb hsmget hsmget.mk hsmput hsmput.mk
|