Scheduling policy on Freja

NOTE: To see what differs in scheduling read the Bi to Freja migration guide

Freja uses fairshare to prioritise between jobs from various groups. Each user group on Freja is assigned a number of shares of the cluster. The more resources a group has used, relative to its assigned share of the total available resources, the lower priority the group’s jobs get in the queue.

To improve utilisation the scheduler also uses backfill, which allows jobs to be started out of priority order when this does not delay the predicted start time of higher priority jobs.

Number of shares for user groups

The number of shares is also used to calculate node limits for high priority jobs. Values as of 2024-02-23:

Group (Slurm Account)	Shares
rossby	186
sm_sb	8
sm_sp	8
sm_fouh	9
sm_fouo	186
sm_foum	183
sm_guest	20
sufm	2

Job types

There are two types of jobs on Freja:

normal jobs
low priority jobs

There is also a tool available to all users to change the priority of jobs themselves:

A project may increase the priority of a small number of jobs for added flexibility using boost-tools.
A project may increase the time limit of a small number of jobs beyond the normal maximum (7 days) using boost-tools.
A project may reserve nodes for a certain time period using boost-tools.

Normal jobs

The vast majority of jobs should be submitted as normal jobs.

Jobs are prioritised using fairshare.

Low priority jobs

Low priority jobs have lower priority than normal and high priority jobs, and will only be started if no other jobs need the requested resources.

Low priority jobs have a max allowed walltime of 4 hours.

Usage of low priority jobs is ‘‘free’’ and is NOT included when calculating fairshare priority for normal jobs.

To submit low priority jobs, use --qos=low.

Specifying Slurm Account

If you are a member of more than one group, you should always use an option like -A rossby, -A sm_fouo etc. to sbatch/interactive to tell Slurm what account to run under.

If you are only part of one group you do not need to use the -A option for normal job submission. You might have to use it under special circumstances, such as cron jobs.

Time limits

The maximum wall time for a job is 7 days (except for low priority jobs which have a 4 hour limit). The default time limit (if you do use a “-t” flag) is 2 hours. Please use the “-t” flag to set a time limit that is appropriate for each job!

Avoid running long jobs if the work can be split into several shorter jobs without losing performance. Several shorter jobs can improve the overall scheduling on the cluster. However, there are limits as Freja is not optimised for very short jobs. For example, splitting a 30 minute job into 30 1-minute jobs is not recommended.

Fat nodes

Freja have 3 fat nodes with extra memory. To use them, add -C fat to your job specification. Do not use --mem or similar options to request fat nodes or to specify that you do not need fat nodes.

Use of the fat nodes counts towards fairshare usage at double the cost of normal nodes. Jobs not requesting fat nodes can be scheduled on fat nodes if no other nodes are available, but will then not be hit with the extra cost.

All job types can request fat nodes.

Node sharing is available on Freja. The idea behind node sharing is that you do not have to allocate a full compute node in order to run a small job. Thus, if you request a job like sbatch -n 1 ... the job may share the node with other jobs smaller than 1 node. Jobs using a full node or more will not experience this (that is, we will not pack two 70-core jobs into 3 nodes). You can turn off node-sharing for otherwise eligible jobs using the --exclusive flag.

Using node sharing is highly recommended on Freja since there are a 64 cores per node, but only 78 nodes.

Warning: If you do not include -n, -N or --exclusive to commands like sbatch and interactive, you will get a single core, not a full node.

When you allocate less than a full node, you get a proportional share of the node’s memory. On a thin node with 384 GiB, that means that you get sligtly less than 6 GiB per allocated core.

Note: you cannot request a fat node on Freja by passing a --mem or --mem-per-cpu option too large for thin nodes. You need to use the -C fat option discussed above.

Job private directories

Each compute node has a local hard disk with approximately 865 GiB available for user files backed by local flash storage. The environment variable $SNIC_TMP in the job script environment points to a writable directory on the local disk that you can use. Each job has private copies of the following directories used for temporary storage:

/scratch/local (`$SNIC_TMP`)
/tmp
/var/tmp

This means that one job cannot read files written by another job running on the same node. This applies even if it is two of your own jobs running on the same node!

Please note that anything stored on the local disk is deleted when your job ends. If some temporary or output files stored there needs to be preserved, copy them to project storage at the end of your job script.

Scheduling policy on Freja

Number of shares for user groups

Job types

Normal jobs

Low priority jobs

Specifying Slurm Account

Time limits

Fat nodes

Job private directories

User support

Getting access

Everything OK!

Self-service