NOAA

Geophysical Fluid
Dynamics Laboratory

Skip to: [content] [navigation]
If you are using Navigator 4.x or Internet Explorer 4.x or Omni Web 4.x , this site will not render correctly!

gfdl homepage > people > cobweb homepage > people > v. balaji homepage > this page

hsmget and hsmput

What are hsmget and hsmput?

As part of the FRE Overhaul we described a three-level storage model that allow data coming from archive to be cached in an area called ptmp (long-term shared scratch) when being copied to vftmp (fast scratch) at runtime.

hsmget and hsmput are scripts that achieve this. Scripts are loosely based on Tim Yeager's proposal.

What hsmget and hsmput do

hsmget and hsmput assume three levels of storage:

  • an archive, possibly remote, slow connection, linear media, permanent storage. This is /archive on the HPCS.
  • a long-term shared scratch, accessible across a cluster of supercomputing nodes such as the HPCS. Here we have two filesystems /ptmp and /work that serve this purpose. You could also use /vftmp/$user as a long-term scratch between jobs on the same node. Data in long-term scratch is not guaranteed to stay forever: anything requiring permanent storage must be archived.
  • a fast scratch, not guaranteed continuity beyond a single batch job. This is the $TMPDIR created on /vftmp for a qlogin/qsub session.

On each of these we define root directories, below which the directory tree is identical for the files being retrieved. That is to say, given variables called $ARCHROOT (e.g "/archive/$USER") , $PTMPROOT (e.g "/ptmp/$USER") and $WORKROOT (e.g "$TMPDIR"), a request to retrieve $WORKROOT/foo/bar will look for a newer file called $PTMPROOT/foo/bar, which in turn will look for a newer file called $ARCHROOT/foo/bar.

The remote file might also be in a tar or cpio container, i.e $ARCHROOT/foo.cpio which contains a file called bar. This is done to reduce the number of individual files in archive.

Note that transfers are only initiated when the source file is newer: the underlying code is actually a Makefile.

hsmget Syntax:

/home/vb/hsm/bin/hsmget retrieves files from remote storage.

Usage: /home/vb/hsm/bin/hsmget [options] file [file...]
  Options:
      -a|--archroot       anchor point on remote storage
      -p|--ptmproot       anchor point on long-term scratch
      -w|--workroot       anchor point on local fast scratch
      -d|--deepget        deep storage retreive command
      -r|--remotecp       remote copy command
      -c|--clustercp      fast network copy command
      -l|--localcp        local filesystem copy command (usually ln)
      -t|--time           turn on timers
      -f|--force          force transfer even if local file up-to-date
      -n|--nocopy         dry run, no actual data transfer
      -m|--makefile       makefile that this invokes
      -v|--verbose        verbose messages.
      -h|--help           print help message.
  Arguments must be files.
  Container directory may be a tar/cpio archive on remote storage.

hsmput syntax

/home/vb/hsm/bin/hsmput puts files to remote storage.

Usage: /home/vb/hsm/bin/hsmput [options] path [path...]
  Options:
      -a|--archroot       anchor point on remote storage
      -p|--ptmproot       anchor point on long-term scratch
      -w|--workroot       anchor point on local fast scratch
      -d|--deepput        deep storage command
      -r|--remotecp       remote copy command
      -c|--clustercp      fast network copy command
      -l|--localcp        local filesystem copy command (usually ln)
      -s|--store          remote store type (cpio, tar or directory)
      -t|--time           turn on timers
      -f|--force          force transfer even if local file up-to-date
      -n|--nocopy         dry run, no actual data transfer
      -m|--makefile       makefile that this invokes
      -v|--verbose        verbose messages.
      -h|--help           print help message.
  Arguments must be files or directories.
  Container directory may be a tar/cpio archive on remote storage.

Notes

  • Note that hsmget arguments must be files. This is to avoid the risk of excessive transfers, filling disks and so on, by inadvertently retrieving large file trees. However you can hsmput an entire directory... this is less risky.
  • hsmput only puts data to ptmp unless you explicitly request storing to archive using hsmput -s cpio or equivalent.
  • To repeat, the underlying code uses make: transfers in either direction are only initiated when the target is out of date.
  • The idea is that this can be generalized to having the archive and compute not being collocated. i.e this can easily be generalized to a situation where the archive is remote and compute local; or vice versa

Download

Currently available as /home/vb/hsm/bin/hsmget and /home/vb/hsm/bin/hsmput. Users of FREv3 can check it out into their bin directory using

cvs co -r omsk_vb hsmget hsmget.mk hsmput hsmput.mk


emacs-muse-mode created by v. balaji (balaji@princeton.edu) in emacs using the emacs-muse mode.
last modified: 28 July 2008
this page visited: 37 times
smaller bigger reset