The Heffa cluster at NSC is an experimental big data analytics resource of 9 + 19 compute nodes presently available trial environment for the Hadoop software stack. Our main goals with this activity is to provide access to the software stack for testing and development purposes, and to explore the level of interest in these technologies. We are also interested in feedback and testing of our setup so we can find any potential problems and learn more about the technologies from an operational point of view.
We strongly emphasize that this is an experimental resource and it is not operated as a regular NSC compute resource. Its use is subject to bugs, outages, and changes with short notice. The size of the cluster is not likely to be sufficient for any true live big data analytics tasks. The system is build from hardware from decommissioned NSC clusters, so in terms of hardware it is obviously not state-of-the-art.
|System server||HP ProLiant DL180 G6|
|Login nodes||2 * HP ProLiant SL170z G6|
|Compute nodes||(9 + 19) * HP ProLiant SL170z G6|
|Processors||2 x 4-core Intel Xeon E5520 processors at 2.2GHz|
|Number of nodes||9 (open to external use) + 19 (for internal tests)|
|Traditional file system||500 Gb raid 6 exported to nodes with nfs.|
|Hdfs file system||Hadoop hdfs|
|Hdfs size||4.5 Tb, distributed onto local 500 Gb hard drives.|
|Operating system||CentOS Linux 7|
|Batch queue system||Yarn|
|Software||Hadoop hdfs, yarn, mapreduce, spark|
Allocation periods and amount of resources will be handled on a case-by-case basis. Depending on interest, we may want to give access to one research group at the time. We kindly request that users give feedback from their experiences and work with us to improve the setup.
To get access, use the SUPR system at: https://supr.snic.se/. Under Rounds: Open for Proposals, find "LiU Local, 2017". Create a new proposal, and make sure to add "Heffa @ NSC" as a resource with 1 x 1000 core hours (the number does not matter in this context). Please "Edit Basic Information" and put down just a couple of sentences about what you hope to try on this resource. (We do not need a full description of a scientific project, since this is just a test resource.)
Access to the cluster is provided either through SSH or the remote desktop solution ThinLinc. Access via SSH will give you a Command Line Interface (CLI) to the system while the ThinLinc access option will present you with an XFCE desktop environment and will let you run graphical applications directly on the login server. These access options are described in detail at https://www.nsc.liu.se/support/graphics/.
Using ThinLinc allows to run graphical applications at NSC. ThinLinc clients are available free of charge for all major OS platforms (Linux, Mac, Windows) from the ThinLinc download page.
Access by means of SSH is described on https://www.nsc.liu.se/support/getting-started/. Note that your first login to Heffa must be done via SSH to change the temporary password you are provided for a permanent one. Be sure to read the notes on security on https://www.nsc.liu.se/support/security/.
Logging in using SSH keys (as described on the Getting Started page) is possible, but with one important caveat: if you wish to access Hadoop/HDFS after logging in using SSH keys, you must manually run "kinit" (and enter your password) on the login node to get a Kerberos ticket which grants access to HDFS.
There are several ways to transfer data to and from the cluster, all of them use SSH/SFTP in some fashion. The recommended way is to use the
scp commands from the command line if you are on a Mac/Linux/Unix/(Windows + Cygwin) system since there are good chances we can diagnose and fix the problem if something should fail when you use these. Graphical scp clients, e.g. WinSCP and PuTTY on Windows or Filezilla, should also work but will be harder for us to help diagnose problems on. You can also use the sshfs FUSE solution to mount desired file systems on some platforms, which lets you use regular
cp or drag-drop file manager functionality in a transparent way when transferring data. More detail can be found on https://www.nsc.liu.se/support/copying-data/.
Persistent 'traditional' user storage on Heffa is provided in:
/nfshome/<your_user_name>. Personal distributed storage in Hadoop is provided in
/user/<your_user_name> (which you can only access via
hadoop fs commands.) If you have access to other cluster resources at NSC, please note that the traditional user storage on Heffa is not the 'center storage', and is thus completely separate from the storage shared between other NSC clusters. I.e., you need to manually transfer files from other clusters to Heffa, if need be.
Heffa is running a fairly standard setup of the Hadoop software stack, with hdfs, map reduce v2, spark and yarn.
There are a few simple Hadoop examples in the
Support is presently handled by emailing the administrator of Heffa on However, there is also the general NSC support address: More details can be found on the NSC support page.
Heffa uses the yarn Hadoop resource manager. Hence, submitting jobs works a bit differently from other NSC clusters using the slurm resource manager. We do not presently have any usage limits on jobs submitted to yarn, but please be mindful to other users.
To see how to interact with yarn, there are some simple Hadoop examples in the
Compiling and running software is done the same way as elsewhere on NSC. Please see https://www.nsc.liu.se/software/index.html to get all the details.
Graphical applications can be run either through X-forwarding over SSH (using the
-Y flag to the
ssh command) or via the remote desktop solution ThinLinc. Usage details are described on https://www.nsc.liu.se/support/graphics/, replace any specific system name (e.g. triolith) mentioned on that page with "heffa".