Systems  
Status displays
System status
Retired systems
 
 
 
 
 
 
 
 
 
 
 
 

Publisher User Guide

1 Overview of Publisher

The Publisher system in its current form allows you to make data on Krypton, Gimle or Vagn available to users anywhere in the world.

The published data is a read-only copy of the original data. Published data cannot be changed, only deleted (automatically after a certain time, or manually by the person who published it). Published data is not updated when the original data changes.

You can publish data that is stored on any of the shared filesystems in Vagn, Krypton or Gimle (e.g /home, /nobackup/*, but not /scratch/local).

You always publish a directory tree with all its contents. If you need to publish a single file, create an empty directory and put the file in it, then publish the directory.

The current system has a capacity of approximately 20TiB published data (shared between all users, no quota).

The Publisher system is connected to the Internet and to Krypton/Gimle/Vagn with a 1Gbps network (so the maximum combined in/out transfer speed will be ~100MB/s).

2 Quickstart

  • Put the data you want to publish into a directory. In this example we use the directory "mytestdata".
  • Choose the publication area to use (see table of available areas below). In this example we use the area "tmp_rossby".
  • Run "pcmd mytestdata tmp_rossby". Sample output:
[sm_mkola@analys1 ~]$ pcmd mytestdata tmp_rossby
Checking dataset......
Generating sha1sum......
data
      f9910632ba63c554ee7ba95c4eb8f0618e4bd986
Checking dataset file sizes  --> OK

Publication created with ID: tmp_rossby.74
Export url: http://exporter.nsc.liu.se/b7b00058ad424381909938b0492ffb28, rsync://exporter.nsc.liu.se/b7b00058ad424381909938b0492ffb28

[sm_mkola@analys1 ~]$ 
  • Note the "Export URL" output. This is the address you should send to the recipient of the data.
  • Run "pcmd -v" until the job is no longer listed.
  • Access the published data using http or rsync. E.g open http://exporter.nsc.liu.se/b7b00058ad424381909938b0492ffb28 in your browser or tell your rsync client to download rsync://exporter.nsc.liu.se/b7b00058ad424381909938b0492ffb28
  • If you lose the export URL, you can always list all your published datasets using "pcmd -qv".
  • You can delete published data using "pcmd -r DATASET_ID", where DATASET_ID is the identifier listed by e.g "pcmd -qv" or the "Publication ID" given when the data was published (e.g "tmp_rossby.74"). Some areas are configured to automatically delete datasets after a certain time.

3 What happens when you publish data?

  1. You run "pcmd" on the Vagn, Krypton or Gimle login node. You need to tell pcmd what directory you want to publish, and to what "publishing area" (e.g "tmp_rossby") you want to publish it. ("pcmd -h" will show all options available when using pcmd)
  2. The system verifies that you are allowed to publish data to the selected publishing area.
  3. pcmd creates a file containing the SHA1 checksum of all files that will be published. This checksum is used by the Publisher system to verify that data was correctly transferred and can also be used by the end-user who downloads published data to verify that all files are intact. (This step is optional, but most publishing areas will use checksumming.)
  4. "pcmd" queues a transfer job. You can check the status of all ongoing transfers using the command "pcmd -j".
  5. Note: the pcmd command will exit as soon as it has performed its checks and created the checksum file. At this point the data is not yet published! Do not delete the data until it has been successfully published (see below)
  6. The publisher system transfers the data to the export server using rsync.
  7. When all data has been transferred and the checksum has been verified (optional), the export server makes the data available to external users over one or more of the supported protocols (currently http and rsync).
  8. The transfer job is removed (no longer visible when you run "pcmd -j").
  9. You can check all your published data sets using "pcmd -qv".
  10. Remember that you need to notify the persons who will be downloading the data that data is available and what address (URL) to use, e.g http://exporter.nsc.liu.se/b7b00058ad424381909938b0492ffb28 for HTTP download or rsync://exporter.nsc.liu.se/b7b00058ad424381909938b0492ffb28 for rsync download.

4 Publication areas

  • Publishing to a publication area is always restricted - only members of a certain group can do it.
  • A publication area can be configured to delete published data after a certain time, or not at all. Data can always be manually deleted by the person that published it.
  • The URL can either be in on the "public" format where you choose a suitable name (e.g http://exporter.nsc.liu.se/rossby/anamethatyouchoose), or on the "secret" format, e.g http://exporter.nsc.liu.se/c690d383de8940308bf3c9f9cbd6e132. The advantage of the secret format is that it is very hard to use guessing or brute-force to find the address to data you're not supposed to have access to. However, this is still a weak security mechanism - anyone who finds out the URL (e.g by checking your browser history or snooping on your network) can download the data.
  • A publication area can be accessible via http and/or rsync (note: the native rsync protocol on port tcp/873 is used, not rsync over SSH).
  • A publication area can be configured to limit the type of data that can be published on it. This feature is currenly very limited, only these checks are supported:
    • minSize - all files must be bigger than N bytes
    • maxSize - all files must be smaller than N MB
    • netCDF - all files must be netCDF files (only the file suffix is checked)
    • README - all published datasets must contain a file named README
  • If you need a new publishing area, please contact smhi-support@nsc.liu.se to discuss this.

4.1 Available publication areas.

You can always see the actual list of publication areas using the command "pcmd -l". The list below is not guaranteed to be up to date.

NameUnix groups allowed to publishProtocolsURL typeDatasets automatically deleted after (days)Limits
tmp_fouasm_fouahttp,rsyncsecret url30max file size 1TB
tmp_foulsm_foulhttp,rsyncsecret url30max file size 1TB
tmp_fouosm_fouohttp,rsyncsecret url30max file size 1TB
tmp_foupsm_fouphttp,rsyncsecret url30max file size 1TB
tmp_bpomsm_bpomhttp,rsyncsecret url30max file size 1TB
tmp_mlsm_mlhttp,rsyncsecret url30max file size 1TB
tmp_mosm_mohttp,rsyncsecret url30max file size 1TB
tmp_misu 1misuhttp,rsyncsecret url30max file size 1TB
tmp_rossbyrossbyhttp,rsyncsecret url30max file size 1TB
tmp_kthmech 1kthmechhttp,rsyncsecret url30max file size 1TB
tmp_miuumiuuhttp,rsyncsecret url30max file size 1TB
rossby_scroadminhttp,rsyncsecret urlnomax file size 1TB
rossby_prroadminhttp,rsyncuser-selectable name7max file size 1TB

5 Using pcmd

Getting help:

Exporting a directory:

Checking all your published data:

Download and verify exported data (from anywhere in the world) using rsync:

Download a single file using http:

Recirsively download and verify a dataset (from anywhere in the world) using HTTP:

Deleting data

6 Limitations and unexpected behaviour

6.1 Permitted file types

A dataset may only contain files and directories. If the directory tree (dataset) that you try to publish contain any other types of data such as symbolic links and sockets, pcmd will display an error message and exit.

This is a design choice and not a bug or technical limitation.

6.2 Deleting published data

Published data can be deleted using "pcmd -r". When a dataset is deleted, it is no longer accessible to users.

However, information about the published dataset are retained in the database and can be displayed using e.g "pcmd -qv" (the data set is displayed as "Deleted").

The URL used by a deleted data set can not be reused for another (i.e you cannot publish some data as http://server/area/my-latest-data and then replace it with updated data next week).

6.3 Publishing files that are not readable by "other"

Note: this behaviour is a bug or undocumented behaviour in Publisher, it will change in a future version.

When you publish a directory tree, the permissions of the files are copied along with the files. If you export files or directories that are only accessible by "user" or "group" they will not be accessible after having been exported.

Workaround: make sure that all files and directories are accessible to "other" before publishing them, e.g by running

chmod -R o+rX <DIRECTORY>

7 Availability

Publisher is not designed to be a high-availability system. It can be considered to be approximately as reliable as an NSC cluster login node (e.g Gimle).

In practise, this means:

  • Publisher will be unavailable for a few minutes when operating system updates are applied (might happen anywhere from once a week to a few times per year)
  • Publisher will be unavailable for a few minutes when we add, remove or modify publication areas (might happen a few times per year).
  • Publisher might be unavailable from a few hours to a day or two if a major hardware problem occurs (there is only one Publisher server, if it fails we have to move the service to another server or repair the broken one).

Published, non-deleted datasets are backed up daily to tape for disaster recovery purposes.

The Publisher internal database that keep tracks of all metadata is backed up to disk hourly and to tape daily.

If this level of availability is not enough for your needs, store your data elsewhere, or contact NSC to discuss how we can improve Publisher.

8 How to get help

If you need help using Publisher, if something does not work as expected, or if you have any other questions, please send an email to your normal support address (vagnekman-support@snic.vr.se for Vagn users and smhi-support@nsc.liu.se for Gimle/Krypton users).

Footnotes:

1 NOTE: Publisher is an SMHI-funded system! Non-SMHI Vagn users (e.g kthmech, misu) are until further notice allowed to use the system for temporary file transfers, but if this causes problems for SMHI users (e.g by filling up the disk) this access may be revoked.






Page last modified: 2013-03-13 16:19
For more information contact us at info@nsc.liu.se.