On some of our clusters (currently Tetralith and Sigma, but this feature could be made available on other systems), we offer a way to combine the local disks or RAM memory in each compute node of a multi-node job into one large area for temporary files.
NOTE: you should NOT use this feature if you just want local scratch space in a single compute node. All compute nodes automatically have /scratch/local available for all jobs.
To enable this feature for a job, request the feature (constraint) "scratchjob" for your job.
BeeGFS will only have optimal performance if all the compute node disks are of the same size and type. On Tetralith and Sigma, different node types have have different disk types. Therefore I suggest that you request nodes of one type for your job.
Four "diskS" nodes, giving a total of ~800 GiB (4 * 200 GB SSD):
sbatch -N4 -C scratchjob,diskS
Two "diskM" nodes, giving a total of ~1700 GiB (2 * 960 GB SSD) :
sbatch -N2 -C scratchjob,diskM
Three "diskL" nodes (only in Tetralith), giving a total of ~5000 GiB (3 * 2 TB NVME):
sbatch -N3 -C scratchjob,diskL
Since the number of nodes with large disks are limited1, only use them if you really need the larger (and faster in the case of the "gpu" nodes) disks.
Ti enable this feature, request the feature (constraint) "scratchjobram" for your job. This will create a file system using a RAM disk (if your application only uses a small amount of RAM, approximately 90% of the RAM in each compute node can be used for /scratch/job).
BeeGFS will only have optimal performance if all the compute nodes have the same amount of RAM. Therefore I suggest that you request nodes of one type for your job.
Four "thin" nodes, giving a total of ~180 GiB (96 GB RAM per node):
sbatch -N4 -C scratchjobram,thin
Two "fat", giving a total of ~370 GiB (384 GB RAM per node):
sbatch -N2 -C scratchjobram,fat
As the Tetralith GPU nodes only have 96 GB RAM each, it does not make sense to request them for scratchjobram unless you will also use the GPUs for your application.
The RAM disk will usually be much faster, but less space is available per node, and it also limits how much RAM your application can use.
Like most distributed file systems, BeeGFS gives you best performance if you use large files.
BeeGFS on a single node has approximately the same performance as using the local disk directly (i.e /scratch/local): 300-500 MiB/s (but some CPU overhead, so you probably want to use /scratch/local instead).
BeeGFS on multiple nodes will generally give aggregated performance2 up to 500 MiB/s per node in the job. If you use the RAM disk, you can get several GiB/s per node.
Please note that BeeGFS can use significant amounts of CPU when the application is doing I/O to /scratch/job. For optimal overall application performance it might make sense to not use all teh CPU cores for the application, but leave some for BeeGFS. How many? It depends on how much I/O you will be doing. If you will be using job-local storage for a large number of jobs it might make sense to do some benchmarking first with different number of cores reserved for BeeOND to find out where the optimum is for your job type.
If you need help in making your jobs use /scratch/job, please contact NSC Support.