NSC operates an archive for SMHI that can be accessed with a file system view. This is implemented via the HSM functionality of Spectrum Scale (GPFS) backed by tape storage. Access is through the SSH based file transfer protocol SFTP.
Only a limited subset of the data is stored on disk in the archive, but all directories and files are visible as if they were. Data is transparently fetched in the background if it is only available on tape. Vice versa data uploaded to the archive is migrated to tape in the background. The disk storage should be seen more as a cache.
To be able to access Webb a separate account is needed. It is not the same one as on any other SMHI systems. Please email email@example.com to request one.
Since most of the data is physically located on tape there will be an initial delay when it is fetched from tape. Depending on the amount of data and if it was uploaded at different times it may require mounting several tape volumes. This may lead to variations in how long accesses take. Transferring data in a batch job to a local server is encouraged so it is not dependant on a workstation or a laptop.
Large files are in general more suited to a streaming media type like tape, so we recommend packing small files into larger archives to improve the recall latency. Due to the way the HSM resource is accessed (SFTP/NFS) usually only one file at a time will be recalled, since the reading processes will open files serially. Extracting an archive onto local storage is usually preferable if the goal is to access many files.
Bandwidth is not the main issue, a single tape drive can deliver higher bandwidth than a singe hard drive, but the latecny is much higher on tape. Mounting a tape cartridge and moving the tape to the right spot can take a couple of minutes if the tape library is heavily loaded and the data is at the end of the tape. Tapes are dismounted fairly quickly (one minute after last access), so if the next request for the same tape comes in later the process starts anew.
How large the archive files should be is dependant on how the data is going to be used. When creating archive files the expected workflow for reading data should be taken into account. For example, if it is time based data series packing it into weekly or monthly archives is usually suitable.