When you first login to an NSC compute cluster (e.g Triolith), you reach the "login node" (some systems have more than one). This is just a small part of the system, it is a single Linux server that serves as a connection to the outside world.
It is important to know that a login node is a resource that is shared with all users of that system, and if it is slow or crashes all users are affected. For this reason we do not allow you to run anything but the most essential things on the login node.
On the login node, you are permitted to:
A very simple rule is "don't run things on the login node that will inconvenience other users".
The more CPU and memory you use, and the longer you use it, the greater the risk that someone else will suffer. Try to use common sense.
If NSC finds what we consider improper use of the login node through complaints from other users or automatic monitoring, we might kill or stop your processes. If this happens, we will notify you.
If you are unsure about if a certain task can be run on the login node, please contact and ask us.
Anything not permitted to run on the login node should be run on one or more of the compute nodes in an "interactive" shell or as a batch job.
An interactive job is what you use if you "just want to run an application", but on a compute node. This is what happens under the hood when you use the "interactive" command:
If your interactive session has not started after 30 seconds, all resources on the system are probably already in use and you will have to wait in the queue. You can check the queue status by logging in to the system again in another window and using the "squeue" command.
Hint: some systems (e.g Triolith, Kappa) have nodes reserved for small and short interactive sessions. See the system-specific information for how to use the development nodes.
Example interactive session (here I reserve 1 node exclusively for my job for 4 hours on Triolith and start Matlab on it):
[kronberg@triolith1 ~]$ interactive -N1 --exclusive -t 4:00:00 Waiting for JOBID 38222 to start [kronberg@n76 ~]$ module add matlab/R2012a [kronberg@n76 ~]$ matlab & [...using Matlab for an hour or two...] [kronberg@n76 ~]$ exit [kronberg@triolith1 ~]$
Remember to end your interactive session by typing "exit". When you do that, the node(s) you reserved are released and become available to other users.
Note: the "interactive" command takes the same options as "sbatch", so you can read the sbatch man page to find out all the options that can be used. The most common ones are:
-t HH:MM:SS: choose for how long you want to reserve resources. Choose a reasonable value! If everyone always use the maximum allowed number, it becomes very difficult to estimate when new jobs can start, and if you forget to end your interactive session, resources will be unavailable to other users until the limit is reached.
-N X --exclusive: reserve X whole nodes
-n X: reserve X CPU cores
--mem X: reserve X megabytes of memory
--reservation=devel: use one of the nodes reserved for short test and development jobs
Hint: It is possible to run several terminals "inside" your interactive shell in a way that still stays inside the job. Since the interactive shell is implemented using "screen" (a terminal window multiplexer) you can use all screen features (see the screen man page or the table below).
|Command||What it does|
|Ctrl-a c||Create a new terminal inside screen|
|Ctrl-a w||List the terminals inside this screen|
|Ctrl-a "||List the terminals inside this screen as a menu|
|Ctrl-a K||Close the current terminal|
|Ctrl-a n||Go to the next terminal|
|Ctrl-a A||Name the current terminal|
|Ctrl-a h||Write terminal contents to file ("screendump")|
|Ctrl-a H||Start/stop logging of terminal to file|
A batch job is a non-interactive (no user input is possible) job. What happens during the batch job is controlled by the job script that is submitted with the job. The job enters the scheduling queue, where it may have to wait for some time until nodes are available to run the job.
Read more about batch jobs and scheduling.
In order to allow you to monitor and debug running jobs, you can login to a compute node directly from the login node provided that you have an active job running on that node.
(If you try to login to a compute node where you do not have a job running you will get the error message "Access denied: user x_XXXXX (uid=NNNN) has no active jobs".)
This feature is only intended for monitoring and debugging running jobs! Do not start any compute jobs from this type of "direct" login! If you do, you circumvent the normal limitations on job length, memory use etc, and you will likely cause problems for yourself or other users (e.g causing the node to run out of memory and stop working).
To use this feature, find out which node your job is using (use e.g
squeue -u $USER), then run e.g
ssh n123 from the login node to login to that compute node. You can then use normal Unix tools like "top" and "ps" to monitor your job.
[x_makro@triolith1 ~]$ ssh n123 Access denied: user x_makro (uid=3375) has no active jobs. Connection closed by 192.168.192.2 [x_makro@triolith1 ~]$ squeue -u x_makro JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 48584 triolith _interac x_makro R 0:09 1 n234 [x_makro@triolith1 ~]$ ssh n234 Last login: Tue Jan 17 11:56:44 2012 from l1 [x_makro@n234 ~]$ top ...
it's difficult to give exact numbers. If you use more than one CPU core, more than a few GB or RAM or run for longer than half an hour, please consider the impact to other users and if you can run on a compute node instead.↩