Using the login node(s)

When you first login to an NSC compute cluster (e.g Triolith), you reach the "login node" (some systems have more than one). This is just a small part of the system, it is a single Linux server that serves as a connection to the outside world.

It is important to know that a login node is a resource that is shared with all users of that system, and if it is slow or crashes all users are affected. For this reason we do not allow you to run anything but the most essential things on the login node.

On the login node, you are permitted to:

  • Run file transfers to and from the system.
  • Manage your files (copy, edit, delete files etc).
  • Submit batch and interactive jobs (more about that later).
  • Run applications that use small amounts1 of CPU and memory.
  • Compile software.

A very simple rule is "don't run things on the login node that will inconvenience other users".

The more CPU and memory you use, and the longer you use it, the greater the risk that someone else will suffer. Try to use common sense.

If NSC finds what we consider improper use of the login node through complaints from other users or automatic monitoring, we might kill or stop your processes. If this happens, we will notify you.

If you are unsure about if a certain task can be run on the login node, please contact and ask us.

Anything not permitted to run on the login node should be run on one or more of the compute nodes in an "interactive" shell or as a batch job.

Interactive jobs

An interactive job is what you use if you "just want to run an application", but on a compute node. This is what happens under the hood when you use the "interactive" command:

  1. You run "interactive", usually with some extra options to use non-default settings, e.g to request more memory or more CPU cores.
  2. The scheduling system puts your request in the queue, waiting for resources (CPU, memory or a certain node type) to become available.
  3. You wait for the job to start.
  4. The scheduling system starts your job on a suitable compute node, and reserves the amount of memory and CPU cores you requested.
  5. You are automatically logged in to the compute node and can start working.

If your interactive session has not started after 30 seconds, all resources on the system are probably already in use and you will have to wait in the queue. You can check the queue status by logging in to the system again in another window and using the "squeue" command.

Hint: some systems (e.g Triolith, Kappa) have nodes reserved for small and short interactive sessions. See the system-specific information for how to use the development nodes.

Example interactive session (here I reserve 1 node exclusively for my job for 4 hours on Triolith and start Matlab on it):

[kronberg@triolith1 ~]$ interactive -N1 --exclusive -t 4:00:00
Waiting for JOBID 38222 to start
[kronberg@n76 ~]$ module add matlab/R2012a
[kronberg@n76 ~]$ matlab &

[...using Matlab for an hour or two...]

[kronberg@n76 ~]$ exit
[kronberg@triolith1 ~]$

Remember to end your interactive session by typing "exit". When you do that, the node(s) you reserved are released and become available to other users.

Note: the "interactive" command takes the same options as "sbatch", so you can read the sbatch man page to find out all the options that can be used. The most common ones are:

  • -t HH:MM:SS: choose for how long you want to reserve resources. Choose a reasonable value! If everyone always use the maximum allowed number, it becomes very difficult to estimate when new jobs can start, and if you forget to end your interactive session, resources will be unavailable to other users until the limit is reached.
  • -N X --exclusive: reserve X whole nodes
  • -n X: reserve X CPU cores
  • --mem X: reserve X megabytes of memory
  • --reservation=devel: use one of the nodes reserved for short test and development jobs

Hint: It is possible to run several terminals "inside" your interactive shell in a way that still stays inside the job. Since the interactive shell is implemented using "screen" (a terminal window multiplexer) you can use all screen features (see the screen man page or the table below).

Table 1: Some common screen commands (read "man screen" for more information):
Command What it does
Ctrl-a c Create a new terminal inside screen
Ctrl-a w List the terminals inside this screen
Ctrl-a " List the terminals inside this screen as a menu
Ctrl-a K Close the current terminal
Ctrl-a n Go to the next terminal
Ctrl-a A Name the current terminal
Ctrl-a h Write terminal contents to file ("screendump")
Ctrl-a H Start/stop logging of terminal to file

Batch jobs

A batch job is a non-interactive (no user input is possible) job. What happens during the batch job is controlled by the job script that is submitted with the job. The job enters the scheduling queue, where it may have to wait for some time until nodes are available to run the job.

Read more about batch jobs and scheduling.

Logins outside the interactive/batch system

In order to allow you to monitor and debug running jobs, you can login to a compute node directly from the login node provided that you have an active job running on that node.

(If you try to login to a compute node where you do not have a job running you will get the error message "Access denied: user x_XXXXX (uid=NNNN) has no active jobs".)

This feature is only intended for monitoring and debugging running jobs! Do not start any compute jobs from this type of "direct" login! If you do, you circumvent the normal limitations on job length, memory use etc, and you will likely cause problems for yourself or other users (e.g causing the node to run out of memory and stop working).

To use this feature, find out which node your job is using (use e.g squeue -u $USER), then run e.g ssh n123 from the login node to login to that compute node. You can then use normal Unix tools like "top" and "ps" to monitor your job.

[x_makro@triolith1 ~]$ ssh n123
Access denied: user x_makro (uid=3375) has no active jobs.
Connection closed by 192.168.192.2
[x_makro@triolith1 ~]$ squeue -u x_makro
  JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
  48584  triolith _interac  x_makro   R       0:09      1 n234
[x_makro@triolith1 ~]$ ssh n234
Last login: Tue Jan 17 11:56:44 2012 from l1
[x_makro@n234 ~]$ top
...

  1. it's difficult to give exact numbers. If you use more than one CPU core, more than a few GB or RAM or run for longer than half an hour, please consider the impact to other users and if you can run on a compute node instead.