R installations at NSC


Level of support

Tier 2 NSC has only limited experience with this software, but we will try to help as much as possible. We have ran some tests, for example, if the program comes with a test suite, but they may be far from exhaustive. We will try to install and test new versions, as soon as we can.

Please see the page describing our software support categories for more information.
You can also contact support@nsc.liu.se for further information.

R is a language and environment for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible.

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It includes

  • an effective data handling and storage facility,
  • a suite of operators for calculations on arrays, in particular matrices,
  • a large, coherent, integrated collection of intermediate tools for data analysis,
  • graphical facilities for data analysis and display either on-screen or on hardcopy, and
  • a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.

Very good starting points to learn about R are the R project web page and the Quick-R web pages.

The R installations at NSC are generally maintained by Johan Raber (raber@nsc.liu.se).

Versions and capabilities

The installed R versions before version 3.0 were compiled with GCC and without optimised linear algebra libraries. Version from 3.0 and above were compiled with the Intel compiler and MKL linear algebra libraries and outperforms previous installations by as much as a factor three in some benchmarks. If you are interested in seeing how it was built, check out the file /software/apps/R/3.0.1/build.txt.

Integrated development environment (IDE) for R

The IDE Rstudio has become quite popular among R developers and it is consequently available on Triolith. Access it with for instance

module load rstudio/0.97.551

We recommend that you use it in conjunction with the VNC solution ThinLinc rather than via X forwarding even though you certainly can do so.

How to run

Load the R module corresponding to the version you want to use. To see which versions are available do a

module avail R

We strongly recommend to use the latest version of R when you have a choice. For instance

module load R/3.0.1 

For doing interactive R work, first allocate a node for your work

interactive -N 1 --exclusive -t 8:00:00 -A <your_project_account>

This allocates one node exclusively for you for 8h. The <your_project_accoount> string is the SNIC or local project name you want to use. If you have only one project, this can be omitted. The “projinfo” command will give you a list of projects you belong to. Note that it may take a while to get a node allocated depending on your priority and available resources. Your priority is a function of how much time you have spent of your allocation in the last 30 days, vis-√†-vis the priority of everybody else in the batch queue.

If you only plan to do a shorter interactive stint, you can use the development nodes of Triolith which have a wall time limit of one hour only, but are on the other hand most often less used and therefore easier to get allocated. This is a good way to do some quick debugging. Allocate like

interactive -N 1 --exclusive -t 1:00:00 -A <your_project_account> --reservation=devel

After you get a node allocated, either launch R on the command line

R

or load the R IDE Rstudio module and launch Rstudio like

module load rstudio/0.97.551
rstudio

Using Rstudio requires you to have either logged in with X forwarding to the login nodes or better yet used the VNC solution ThinLinc. A very important flag to “interactive” (and sbatch) to know about, is the “-C” flag which can be used to allocate a “fat” node, i.e. a node with substantially more memory installed than the baseline 32 GB of triolith. On triolith the fat nodes are currently equipped with 128 GB RAM. To get a fat node add the option “-C fat” to “interactive” or you batch processing script.

Running R batch scripts

A minimum batch script for running R looks like this:

#!/bin/bash
#SBATCH -N 1
#SBATCH -t 4:00:00
#SBATCH -J jobname
#SBATCH --exclusive
#SBATCH -A SNIC-xxx-yyy

module load R/<desired_version>
R CMD BATCH [options] R_script_name.R

Note that you should edit the jobname, account number and desired R version before submitting. The brackeded options are of course optional and should be removed if you don’t use them. To get a fat node, add an SBATCH line saying “#SBATCH -C fat” to the above script.

Caveats

There are some “gotchas” to be aware of:

From version 3.0 the R installations uses the threaded versions of Intel MKL for the linear algebra routines and by default when loading the R module the environment variable OMP_NUM_THREADS is set to 1 if it was unset at module load time. If you allocate a full node for your work, you should set OMP_NUM_THREADS to 16 to make best use of the resources, e.g for bash you would do “export OMP_NUM_THREADS=16” and for csh you would do “setenv OMP_NUM_THREADS 16” before launching R (or Rstudio). Do not run on the login node with OMP_NUM_THREADS=16!

From version 3.0, the R installations at NSC were built with the Intel compilers and old packages are unlikely to be compatible. A quick way to recompile your old packages to this new version of R is to launch the old R and do a

> my_packages <- as.vector(installed.packages(lib.loc = .libPaths()[1])[,1])
> q("yes")

Now load the new R version module and launch R to do

> install.packages(my_packages)

In general, R packages are compatible between bugfix releases but not feature releases, i.e. compatibility can be expected within the Z series in R version X.Y.Z, but not between different X and Y releases.