NSC Centre Storage

The "NSC Centre Storage" system provides shared file storage for the Triolith, Kappa and Matter systems.

This page contains information needed for users of NSC Centre Storage.

If you are the Principal Investigator ("PI") of a project using an NSC system, please read this page which contains information on how to manage the storage for your project. Also, please read this page which contains information about how to apply for (more) storage space for your project.

Recent changes (October 2014 - January 2015)

The previous storage system has been replaced with a new one of much higher capacity and performance.

We have also changed how storage is allocated, users will no longer have large amounts of personal storage (/nobackup/global/USER is no more), instead the computing projects are assigned storage which is shared by all users in the project (/proj/PROJECTDIR).

How to use NSC Centre Storage

Where should I store my files?

Users can store files in several different locations. Each location has its own characteristics. Some locations (/home, /proj) are located on NSC's Centre Storage system and can be accessed from Triolith, Kappa and Matter.

There are limits to how much data you can store in each location. On /home and /proj, a quota system limits how much you can use. On /scratch/local you are limited by the physical size of the disk in each compute node.

Use the command snicquota to show how much space is used, and how much is available, on /home and /proj (and which project directories you have access to).

Do not store large amounts of data in other writable locations (e.g on /tmp, /var/tmp, /dev/shm), since the space there is very limited and shared by all users of that node.

Your home directory (/home/YOUR_USERNAME)

Your home directory is intended for storage of small amounts of data, e.g:

  • Application settings (usually stored in hidden files/directories in your home directory)
  • Small amounts of scripts, source code etc that does not belong to any one project.

By default, your home directory is limited to 20 GiB of files (quota). You can store up to 30 GiB for a week (limit). We believe that this should be enough for all user's needs.

Note: there is also a limit of one million files per user.

Project storage directories (/proj/PROJECTDIR)

By default, each project that has been allocated computing time on Triolith, Kappa or Matter will have a directory under /proj (e.g /proj/snic2014-1-123, /proj/somename) where the project members can store their data associated with that project. The name of the directory is decided by the project Principal Investigator ("PI").

If you cannot find the project directory for a new project, it might be because the project PI has not yet chosen a directory name.

If you can find the project directory for a project that you are a member of but cannot access it, try logging out and back in again. If that does not help, contact NSC Support.

You can see project directories available to you (many users are members of several projects) using the snicquota command. It will also show how much space you are using, and how much is available. You can also see how much space other users in the project are using (snicquota -a. Run snicquota --help to see all available options).

Note: despite the lack of the word "nobackup" in the directory name /proj, we do not make tape backups of /proj data! Read the "Is my data safe?" section for more information.

Limits / quota

Project directories are limited both in how much data (GiB) they can store, and how many files can be stored. The data quota limit is the most important one. The file quota limit is mostly a way to discover when projects begin to store excessive number of files (which can be a performance problem).

Please note that both limits can be raised, and that getting more storage is typically very easy, all that is needed in most cases is an email to NSC explaining how much you need, and why.

The "Quota" displayed by snicquota is the actual volume of data / number of files that the project is allowed to store. This is the limit you should use when planning your storage use.

You may exceed the Quota limit for up to 30 days ("Grace" time as shown by snicquota). If your usage exceeds the Quota limit for more than 30 days, all writes to your project directory (for all users) will be stopped until usage drops below the Quota limit. There is also an upper hard "Limit" (as displayed by snicquota) that you may never exceed. The hard Limit is currently (2014-11-24) set to 150% of your Quota, but will probably be lowered once all data has been moved to /proj.

Due to the significant impact it will have on your running jobs (i.e they will almost certainly fail), you should make sure that you never exceed the hard limit or the 30-day limit. Try to stay below the Quota at all times. It is better to ask for a higher storage allocation than to risk hitting your limit and having jobs fail.

Projects without their own storage

It's possible for several projects to merge their allocated storage into a single project directory. If you see fewer available directories than the number of projects you are a member of, that might be why.

Please ask the PI of your project if you don't know where to store data associated with that project. You can find the name of the PI in NSC Express.

Personal work areas

NSC recommends that projects give all users their own directory within the project directory to use as a working area. By default, NSC will create /proj/PROJECTDIR/users/USERNAME for all project directories a user has access to the first time the user logs in. If your project PI has not decided otherwise, you can assume that is where you should store your data.

What types of data to store in project storage areas

You should use the project storage directory for all data associated with the project, except for temporary files only needed during the job (these should be stored on the local disk in each compute node, see below). This includes:

  • Input files
  • Output files
  • Job scripts
  • Any applications installed by project members

If you want extra protection for small-volume, high-value data such as source code or scripts, you can store it on /home (or keep an extra copy there, or outside NSC).

Misc

The environment variable $SNIC_NOBACKUP is set in the job script environment to the /proj/PROJECTDIR/users/USERNAME directory for the project the job is using, if such a directory exists.

Local disk in each compute node

Each compute node has a local hard disk (200GiB on Kappa, 500GiB on Matter and Triolith). Most of that disk is available to users for storing temporary files that are only needed during a job.

The environment variable $SNIC_TMP in the job script environment points to a writable directory on the local disk that you can use.

Please note that anything stored on the local disk is deleted when your job ends. If some temporary or output files stored there needs to be preserved, copy them to project storage at the end of your job script.

Please use the local disk when possible. By doing so, you're reducing the load on the Centre Storage servers, which makes the shared /home and /proj file systems as fast as possible for you and all other users.

If you need help in making your jobs use the local disk, please contact NSC Support.

Long-term storage

NSC Centre Storage is only intended for short- and medium-term storage during your project. When your project ends, you must remove your data from Centre Storage.

If you don't have space for the data at your home university, NSC recommends using National Storage for archving it.

If one project directly replaces another (e.g SNIC 2014/8-42 continues next year as SNIC 2015/8-26), the project PI can choose to keep the existing storage directory, but connected to the new project. In that case, some data can be kept, and job scripts etc do not need to be changed.

However, please note that Centre Storage is still not a suitable place to store data long-term (e.g due to no tape backup).

Is my data safe?

We consider the storage system to be very reliable. It is based on the same proven technology (GPFS) as the previous system (which we consider to have been very reliable), but has been improved in several ways, e.g:

  • Uses end-to-end check sums and version numbers to detect, locate and correct data corruption that would be undetectable in earlier systems.
  • Faster rebuild times after a disk failure with "declustered RAID". This minimizes the window where the system is vulnerable to additional disk failures, and also minimizes the performance impact to users during the rebuild following a disk failure.

Data on the system is protected against multiple disk failures using "8+2 Reed-Solomon" or better (i.e two disks out of a group of 10 can fail without affecting access to data). Combined with the short rebuild times after a disk failure we consider the risk of losing data due to disk failures to be very low.

We also use "snapshots" to protect against you (or NSC) accidentally deleting files.

However, we do not protect you against all types of failures. Some events can lead to loss of data, e.g

  • Various types of physical "disasters": flooding, fire, theft, vandalism, ...
  • Software bugs in the storage system
  • NSC staff mistakes
  • Intrusions
  • Data being accidentally being deleted, and this not being discovered until much later (when the data no longer exists in a snapshot)

After considering the value of our user's data (which often can be recreated by re-running compute jobs), the cost of making off-site backups (which could protect against most disasters and some software bugs, some mistakes and some intrusions) and the low risk of data loss due to the above risks, we have decided only to perform limited tape backups of home directories (weekly) and do no tape backups of project storage.

Put differently: for a fixed amount of money available for storage, we bought hard drives, not backup tapes.

If your data is very valuable or irreplaceable, we recommend that you keep copies outside NSC. If you cannot store that data at your home university, we can recommend National Storage.

Can I recover a deleted file?

Yes, sometimes. The system uses "snapshots" (a read-only point-in-time view of the file system that can be used to restore files from).

Snapshots are taken at certain intervals (at least daily) and kept for a certain time (not decided yet, but probably a few days rather than weeks or months).

Snapshots are available on /home and /proj.

To recover deleted files from a snapshot (or check the contents of a file as is was at an earlier time), go to /proj/.snapshots or /home/.snapshots. There you will find one directory per available snapshot. Then change into the snapshot directory (e.g cd daily-Thursday), and you will see the files as they were at the time the snapshot was taken. To "undelete" a file, simply copy it to a location outside the .snapshots directory tree.

Files created and deleted in the time between when two snapshots were taken cannot be restored.

Files that were deleted too long ago (before the currently oldest snapshot was taken) cannot be restored from snapshots.

Example - restoring a file from a snapshot:

Oops, I have accidentally deleted a file:

[kronberg@triolith1 ~]$ ls -al /proj/nsc/users/kronberg/ior.*
ls: cannot access /proj/nsc/users/kronberg/ior.*: No such file or directory

List the available snapshots:

[kronberg@triolith1 ~]$ ls -lrt /proj/.snapshots/
total 448
drwxr-xr-x 115 root root 32768 Oct 17 16:16 daily-Saturday
drwxr-xr-x 116 root root 32768 Oct 18 17:20 daily-Sunday
drwxr-xr-x 116 root root 32768 Oct 18 17:20 daily-Monday
drwxr-xr-x 117 root root 32768 Oct 20 12:10 daily-Tuesday
drwxr-xr-x 123 root root 32768 Oct 21 14:20 daily-Wednesday
drwxr-xr-x 123 root root 32768 Oct 21 14:20 daily-Thursday
drwxr-xr-x 124 root root 32768 Oct 24 00:00 daily-Friday

Check in which snapshots my missing file is present:

[kronberg@triolith1 ~]$ ls -al /proj/.snapshots/*/nsc/users/kronberg/ior.*
-rw-r--r-- 1 kronberg pg_nsc 1099511627776 Oct 17 16:38 /proj/.snapshots/daily-Monday/nsc/users/kronberg/ior.testfile.triolith
-rw-r--r-- 1 kronberg pg_nsc 1099511627776 Oct 17 16:38 /proj/.snapshots/daily-Saturday/nsc/users/kronberg/ior.testfile.triolith
-rw-r--r-- 1 kronberg pg_nsc 1099511627776 Oct 17 16:38 /proj/.snapshots/daily-Sunday/nsc/users/kronberg/ior.testfile.triolith

Restore the file by copying a version (in this case, the latest one) of it:

[kronberg@triolith1 ~]$ cp /proj/.snapshots/daily-Monday/nsc/users/kronberg/ior.testfile.triolith /proj/nsc/users/kronberg/
[kronberg@triolith1 ~]$ ls -al /proj/nsc/users/kronberg/ior.*
-rw-r--r-- 1 kronberg pg_nsc 1099511627776 Oct 27 12:12 /proj/nsc/users/kronberg/ior.testfile.triolith
[kronberg@triolith1 ~]$ 

Tape backups

For disaster recovery purposes, we make tape backups of the /home and /software directories at least weekly.

If you want us to try to recover a file from this backup, please contact NSC Support.

How can I transfer data to and from NSC?

See this page

What to do if I need more space

Talk to the Principal Investigator (PI) of your project (log in to NSC Express if you don't know who the PI is).

The PI is responsible for how data is stored in the project directory, and is the one who should ask for more space when needed.

NSC Centre Storage policy

  • NSC Centre Storage is intended for short- and medium-term storage of data associated with ongoing computing or analysis projects using NSC's academic systems (currently Triolith, Kappa and Matter).1
  • Storage is allocated to a project according to its needs. A project will be provided with the amount of storage needed to do the research described in their project application if that is possible without causing problems for other users. The project must explain its needs to NSC, and must be prepared to show that it will be storing data and using the storage system efficiently.
  • Storage is limited both by volume (bytes), and number of files stored.2
  • All storage allocations are for a limited time only (how long can be seen in NSC Express). Projects that continue under another name can keep their storage directory name (/proj/projname), but NSC may review the amount of storage granted and may then increase or decrease it.
  • Storage must be used efficiently and in a way that does not cause problems for NSC staff or other projects. This includes using suitable file formats, compressing data when appropriate, packaging data in archive formats (e.g tar, zip) when appropriate, ... This also includes how applications perform I/O (e.g not doing unnecessary I/O or I/O patterns that cause problems for other users)
  • The decision on how much storage to allocate to a project is currently made by NSC, regardless of storage requirements put into the SNAC application (but we will of course take that into consideration). If you are unhappy with our decision, please contact NSC Support or the NSC Director.

Storage system hardware, software and performance

The new storage system consists of three IBM "System x GPFS Storage Server" Model 26 building blocks, a.k.a "GSS26".

The system occupies two 19" racks and consists of six servers and 18 disk enclosures. In total there are 1044 spinning hard disks (4 TB each) and 18 SSD disks (200 GB each).

On this hardware we currently (as of 2014-10-29) run version 2.0 of IBM's GSS software stack, which consists of:

  • Linux on the servers (Red Hat Enterprise Linux version 6.5)
  • IBM's "GNR" software RAID layer (part of GPFS)
  • GPFS 4.1

The total disk space available to store files is approximately 2800 TiB. The difference between 1044*4 TB "raw" space on the disks and the available 2800 TiB on the file system is mostly due to:

  • RAID overhead3
  • The difference between a Tebibyte/TiB and a Terabyte/TB
  • "Spare space" - unused disk space that is used to restore data redundancy when a disk fails.

The storage system is connected to Triolith using four Mellanox FDR 56Gbits/s InfiniBand links per server. In practice, the hard disks will often be the bottleneck for I/O, and the maximum sustained aggregated transfer speed (when writing or reading from many compute nodes simultaneously) we have seen during testing is around 45 GiB per second. This is more than 10 times the theoretical maximum speed of the previous system.

From a single thread/core on a single Triolith compute node you can expect to read or write up to around 1 GiB per second (as long as the disk system is not overloaded by other jobs). On login and analysis nodes this figure will be higher, around 3.5 GiB/s.

Kappa and Matter are connected to the system using Ethernet, with a maximum total bandwith of 2.5 GiB/s.

The total cost (including computer room space, power, cooling, hardware, NSC staff, hardware support, ...) for the planned lifetime of the system (5 years) will be around 15 million SEK, or around 1000 SEK per usable TiB per year.

The power consumption (included in the total cost above) is around 18 kW, or around 6 Watt per TiB of available space.


  1. Long-term storage is available on other systems, e.g National Storage.

  2. There are two reasons to limit the number of files stored by a project. 1: certain operations, like checking/repairing the file system, and starting it after certain types of crashes, takes time proportional to the number of files in it. 2: every file, even empty ones, consumes a certain amount of storage space for metadata (filename, permissions, timestamps, ...), which is not counted towards the normal quota. The files limit is currently not shown in NSC Express. We will typically be generous when asked to raise this limit, it acts mostly as a tripwire to alert us to when a project starts storing data in a problematic way (millions of small files).

  3. File data data is protected by 8+2 Reed-Solomon code, i.e 8 data blocks require 2 parity blocks to be stored on disk. Meta data (file system structure, directories, contents of small files) is protected by 3-Way replication.

Storage allocations

Information on how to apply for (additional) storage on the new NSC Centre Storage

Information for PIs

Information on the new NSC Centre Storage for Principal Investigators of existing projects