The "NSC Centre Storage" system provides shared file storage for the Triolith, Kappa and Matter systems.
This page contains information needed for users of NSC Centre Storage.
If you are the Principal Investigator ("PI") of a project using an NSC system, please read this page which contains information on how to manage the storage for your project. Also, please read this page which contains information about how to apply for (more) storage space for your project.
The previous storage system has been replaced with a new one of much higher capacity and performance.
We have also changed how storage is allocated, users will no longer have large amounts of personal storage (
/nobackup/global/USER is no more), instead the computing projects are assigned storage which is shared by all users in the project (
Users can store files in several different locations. Each location has its own characteristics. Some locations (
/proj) are located on NSC's Centre Storage system and can be accessed from Triolith, Kappa and Matter.
There are limits to how much data you can store in each location. On
/proj, a quota system limits how much you can use. On
/scratch/local you are limited by the physical size of the disk in each compute node.
Use the command
snicquota to show how much space is used, and how much is available, on
/proj (and which project directories you have access to).
Do not store large amounts of data in other writable locations (e.g on
/dev/shm), since the space there is very limited and shared by all users of that node.
Your home directory is intended for storage of small amounts of data, e.g:
By default, your home directory is limited to 20 GiB of files (quota). You can store up to 30 GiB for a week (limit). We believe that this should be enough for all user's needs.
Note: there is also a limit of one million files per user.
By default, each project that has been allocated computing time on Triolith, Kappa or Matter will have a directory under /proj (e.g
/proj/somename) where the project members can store their data associated with that project. The name of the directory is decided by the project Principal Investigator ("PI").
If you cannot find the project directory for a new project, it might be because the project PI has not yet chosen a directory name.
If you can find the project directory for a project that you are a member of but cannot access it, try logging out and back in again. If that does not help, contact NSC Support.
You can see project directories available to you (many users are members of several projects) using the
snicquota command. It will also show how much space you are using, and how much is available. You can also see how much space other users in the project are using (
snicquota -a. Run
snicquota --help to see all available options).
Note: despite the lack of the word "nobackup" in the directory name
/proj, we do not make tape backups of /proj data! Read the "Is my data safe?" section for more information.
Project directories are limited both in how much data (GiB) they can store, and how many files can be stored. The data quota limit is the most important one. The file quota limit is mostly a way to discover when projects begin to store excessive number of files (which can be a performance problem).
Please note that both limits can be raised, and that getting more storage is typically very easy, all that is needed in most cases is an email to NSC explaining how much you need, and why.
The "Quota" displayed by
snicquota is the actual volume of data / number of files that the project is allowed to store. This is the limit you should use when planning your storage use.
You may exceed the Quota limit for up to 30 days ("Grace" time as shown by
snicquota). If your usage exceeds the Quota limit for more than 30 days, all writes to your project directory (for all users) will be stopped until usage drops below the Quota limit. There is also an upper hard "Limit" (as displayed by
snicquota) that you may never exceed. The hard Limit is currently (2014-11-24) set to 150% of your Quota, but will probably be lowered once all data has been moved to /proj.
Due to the significant impact it will have on your running jobs (i.e they will almost certainly fail), you should make sure that you never exceed the hard limit or the 30-day limit. Try to stay below the Quota at all times. It is better to ask for a higher storage allocation than to risk hitting your limit and having jobs fail.
It's possible for several projects to merge their allocated storage into a single project directory. If you see fewer available directories than the number of projects you are a member of, that might be why.
Please ask the PI of your project if you don't know where to store data associated with that project. You can find the name of the PI in NSC Express.
NSC recommends that projects give all users their own directory within the project directory to use as a working area. By default, NSC will create
/proj/PROJECTDIR/users/USERNAME for all project directories a user has access to the first time the user logs in. If your project PI has not decided otherwise, you can assume that is where you should store your data.
You should use the project storage directory for all data associated with the project, except for temporary files only needed during the job (these should be stored on the local disk in each compute node, see below). This includes:
If you want extra protection for small-volume, high-value data such as source code or scripts, you can store it on /home (or keep an extra copy there, or outside NSC).
The environment variable
$SNIC_NOBACKUP is set in the job script environment to the
/proj/PROJECTDIR/users/USERNAME directory for the project the job is using, if such a directory exists.
Each compute node has a local hard disk (200GiB on Kappa, 500GiB on Matter and Triolith). Most of that disk is available to users for storing temporary files that are only needed during a job.
The environment variable
$SNIC_TMP in the job script environment points to a writable directory on the local disk that you can use.
Please note that anything stored on the local disk is deleted when your job ends. If some temporary or output files stored there needs to be preserved, copy them to project storage at the end of your job script.
Please use the local disk when possible. By doing so, you're reducing the load on the Centre Storage servers, which makes the shared /home and /proj file systems as fast as possible for you and all other users.
If you need help in making your jobs use the local disk, please contact NSC Support.
NSC Centre Storage is only intended for short- and medium-term storage during your project. When your project ends, you must remove your data from Centre Storage.
If you don't have space for the data at your home university, NSC recommends using National Storage for archving it.
If one project directly replaces another (e.g SNIC 2014/8-42 continues next year as SNIC 2015/8-26), the project PI can choose to keep the existing storage directory, but connected to the new project. In that case, some data can be kept, and job scripts etc do not need to be changed.
However, please note that Centre Storage is still not a suitable place to store data long-term (e.g due to no tape backup).
We consider the storage system to be very reliable. It is based on the same proven technology (GPFS) as the previous system (which we consider to have been very reliable), but has been improved in several ways, e.g:
Data on the system is protected against multiple disk failures using "8+2 Reed-Solomon" or better (i.e two disks out of a group of 10 can fail without affecting access to data). Combined with the short rebuild times after a disk failure we consider the risk of losing data due to disk failures to be very low.
We also use "snapshots" to protect against you (or NSC) accidentally deleting files.
However, we do not protect you against all types of failures. Some events can lead to loss of data, e.g
After considering the value of our user's data (which often can be recreated by re-running compute jobs), the cost of making off-site backups (which could protect against most disasters and some software bugs, some mistakes and some intrusions) and the low risk of data loss due to the above risks, we have decided only to perform limited tape backups of home directories (weekly) and do no tape backups of project storage.
Put differently: for a fixed amount of money available for storage, we bought hard drives, not backup tapes.
If your data is very valuable or irreplaceable, we recommend that you keep copies outside NSC. If you cannot store that data at your home university, we can recommend National Storage.
Yes, sometimes. The system uses "snapshots" (a read-only point-in-time view of the file system that can be used to restore files from).
Snapshots are taken at certain intervals (at least daily) and kept for a certain time (not decided yet, but probably a few days rather than weeks or months).
Snapshots are available on /home and /proj.
To recover deleted files from a snapshot (or check the contents of a file as is was at an earlier time), go to
/home/.snapshots. There you will find one directory per available snapshot. Then change into the snapshot directory (e.g
cd daily-Thursday), and you will see the files as they were at the time the snapshot was taken. To "undelete" a file, simply copy it to a location outside the
.snapshots directory tree.
Files created and deleted in the time between when two snapshots were taken cannot be restored.
Files that were deleted too long ago (before the currently oldest snapshot was taken) cannot be restored from snapshots.
Oops, I have accidentally deleted a file:
[kronberg@triolith1 ~]$ ls -al /proj/nsc/users/kronberg/ior.* ls: cannot access /proj/nsc/users/kronberg/ior.*: No such file or directory
List the available snapshots:
[kronberg@triolith1 ~]$ ls -lrt /proj/.snapshots/ total 448 drwxr-xr-x 115 root root 32768 Oct 17 16:16 daily-Saturday drwxr-xr-x 116 root root 32768 Oct 18 17:20 daily-Sunday drwxr-xr-x 116 root root 32768 Oct 18 17:20 daily-Monday drwxr-xr-x 117 root root 32768 Oct 20 12:10 daily-Tuesday drwxr-xr-x 123 root root 32768 Oct 21 14:20 daily-Wednesday drwxr-xr-x 123 root root 32768 Oct 21 14:20 daily-Thursday drwxr-xr-x 124 root root 32768 Oct 24 00:00 daily-Friday
Check in which snapshots my missing file is present:
[kronberg@triolith1 ~]$ ls -al /proj/.snapshots/*/nsc/users/kronberg/ior.* -rw-r--r-- 1 kronberg pg_nsc 1099511627776 Oct 17 16:38 /proj/.snapshots/daily-Monday/nsc/users/kronberg/ior.testfile.triolith -rw-r--r-- 1 kronberg pg_nsc 1099511627776 Oct 17 16:38 /proj/.snapshots/daily-Saturday/nsc/users/kronberg/ior.testfile.triolith -rw-r--r-- 1 kronberg pg_nsc 1099511627776 Oct 17 16:38 /proj/.snapshots/daily-Sunday/nsc/users/kronberg/ior.testfile.triolith
Restore the file by copying a version (in this case, the latest one) of it:
[kronberg@triolith1 ~]$ cp /proj/.snapshots/daily-Monday/nsc/users/kronberg/ior.testfile.triolith /proj/nsc/users/kronberg/ [kronberg@triolith1 ~]$ ls -al /proj/nsc/users/kronberg/ior.* -rw-r--r-- 1 kronberg pg_nsc 1099511627776 Oct 27 12:12 /proj/nsc/users/kronberg/ior.testfile.triolith [kronberg@triolith1 ~]$
For disaster recovery purposes, we make tape backups of the /home and /software directories at least weekly.
If you want us to try to recover a file from this backup, please contact NSC Support.
See this page
Talk to the Principal Investigator (PI) of your project (log in to NSC Express if you don't know who the PI is).
The PI is responsible for how data is stored in the project directory, and is the one who should ask for more space when needed.
/proj/projname), but NSC may review the amount of storage granted and may then increase or decrease it.
The new storage system consists of three IBM "System x GPFS Storage Server" Model 26 building blocks, a.k.a "GSS26".
The system occupies two 19" racks and consists of six servers and 18 disk enclosures. In total there are 1044 spinning hard disks (4 TB each) and 18 SSD disks (200 GB each).
On this hardware we currently (as of 2014-10-29) run version 2.0 of IBM's GSS software stack, which consists of:
The total disk space available to store files is approximately 2800 TiB. The difference between 1044*4 TB "raw" space on the disks and the available 2800 TiB on the file system is mostly due to:
The storage system is connected to Triolith using four Mellanox FDR 56Gbits/s InfiniBand links per server. In practice, the hard disks will often be the bottleneck for I/O, and the maximum sustained aggregated transfer speed (when writing or reading from many compute nodes simultaneously) we have seen during testing is around 45 GiB per second. This is more than 10 times the theoretical maximum speed of the previous system.
From a single thread/core on a single Triolith compute node you can expect to read or write up to around 1 GiB per second (as long as the disk system is not overloaded by other jobs). On login and analysis nodes this figure will be higher, around 3.5 GiB/s.
Kappa and Matter are connected to the system using Ethernet, with a maximum total bandwith of 2.5 GiB/s.
The total cost (including computer room space, power, cooling, hardware, NSC staff, hardware support, ...) for the planned lifetime of the system (5 years) will be around 15 million SEK, or around 1000 SEK per usable TiB per year.
The power consumption (included in the total cost above) is around 18 kW, or around 6 Watt per TiB of available space.
There are two reasons to limit the number of files stored by a project. 1: certain operations, like checking/repairing the file system, and starting it after certain types of crashes, takes time proportional to the number of files in it. 2: every file, even empty ones, consumes a certain amount of storage space for metadata (filename, permissions, timestamps, ...), which is not counted towards the normal quota. The files limit is currently not shown in NSC Express. We will typically be generous when asked to raise this limit, it acts mostly as a tripwire to alert us to when a project starts storing data in a problematic way (millions of small files).↩
File data data is protected by 8+2 Reed-Solomon code, i.e 8 data blocks require 2 parity blocks to be stored on disk. Meta data (file system structure, directories, contents of small files) is protected by 3-Way replication.↩