R is a language and environment for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible.
R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It includes
The R installations at NSC are generally maintained by Johan Raber (firstname.lastname@example.org).
The installed R versions before version 3.0 were compiled with GCC and without optimised linear algebra libraries. Version from 3.0 and above were compiled with the Intel compiler and MKL linear algebra libraries and outperforms previous installations by as much as a factor three in some benchmarks. If you are interested in seeing how it was built, check out the file /software/apps/R/3.0.1/build.txt.
The IDE Rstudio has become quite popular among R developers and it is consequently available on Triolith. Access it with for instance
module load rstudio/0.97.551
We recommend that you use it in conjunction with the VNC solution ThinLinc rather than via X forwarding even though you certainly can do so.
Load the R module corresponding to the version you want to use. To see which versions are available do a
module avail R
We strongly recommend to use the latest version of R when you have a choice. For instance
module load R/3.0.1
For doing interactive R work, first allocate a node for your work
interactive -N 1 --exclusive -t 8:00:00 -A <your_project_account>
This allocates one node exclusively for you for 8h. The <your_project_accoount> string is the SNIC or local project name you want to use. If you have only one project, this can be omitted. The “projinfo” command will give you a list of projects you belong to. Note that it may take a while to get a node allocated depending on your priority and available resources. Your priority is a function of how much time you have spent of your allocation in the last 30 days, vis-à-vis the priority of everybody else in the batch queue.
If you only plan to do a shorter interactive stint, you can use the development nodes of Triolith which have a wall time limit of one hour only, but are on the other hand most often less used and therefore easier to get allocated. This is a good way to do some quick debugging. Allocate like
interactive -N 1 --exclusive -t 1:00:00 -A <your_project_account> --reservation=devel
After you get a node allocated, either launch R on the command line
or load the R IDE Rstudio module and launch Rstudio like
module load rstudio/0.97.551 rstudio
Using Rstudio requires you to have either logged in with X forwarding to the login nodes or better yet used the VNC solution ThinLinc. A very important flag to “interactive” (and sbatch) to know about, is the “-C” flag which can be used to allocate a “fat” node, i.e. a node with substantially more memory installed than the baseline 32 GB of triolith. On triolith the fat nodes are currently equipped with 128 GB RAM. To get a fat node add the option “-C fat” to “interactive” or you batch processing script.
A minimum batch script for running R looks like this:
#!/bin/bash #SBATCH -N 1 #SBATCH -t 4:00:00 #SBATCH -J jobname #SBATCH --exclusive #SBATCH -A SNIC-xxx-yyy module load R/<desired_version> R CMD BATCH [options] R_script_name.R
Note that you should edit the jobname, account number and desired R version before submitting. The brackeded options are of course optional and should be removed if you don’t use them. To get a fat node, add an SBATCH line saying “#SBATCH -C fat” to the above script.
There are some “gotchas” to be aware of:
From version 3.0 the R installations uses the threaded versions of Intel MKL for the linear algebra routines and by default when loading the R module the environment variable OMP_NUM_THREADS is set to 1 if it was unset at module load time. If you allocate a full node for your work, you should set OMP_NUM_THREADS to 16 to make best use of the resources, e.g for bash you would do “export OMP_NUM_THREADS=16” and for csh you would do “setenv OMP_NUM_THREADS 16” before launching R (or Rstudio). Do not run on the login node with OMP_NUM_THREADS=16!
From version 3.0, the R installations at NSC were built with the Intel compilers and old packages are unlikely to be compatible. A quick way to recompile your old packages to this new version of R is to launch the old R and do a
> my_packages <- as.vector(installed.packages(lib.loc = .libPaths())[,1]) > q("yes")
Now load the new R version module and launch R to do
In general, R packages are compatible between bugfix releases but not feature releases, i.e. compatibility can be expected within the Z series in R version X.Y.Z, but not between different X and Y releases.