Last year's

3rd Annual Workshop on
Linux Clusters for Super Computing

Clusters for High Performance Computing and GRID Solutions

23-25 October, 2002
Hosted by
National Supercomputer Centre (NSC)
Linköping University, SWEDEN





& Abstracts





October 23

13:15 - 15:00 Tutorial 1: Using Linux Clusters at NSC
Peter Kjellström, NSC
in R6, building C
13:15 - 15:00 Tutorial 2: Building PC-clusters - putting the bits and pieces together
Lennart Karlsson, NSC
in C3, building C
15:15 - 18:00 Tutorial 3: Grid Computing on the NorduGrid Testbed: a hands on tutorial
Balazs Konya, Lund University
in C4, building C
outline Presented material and exercises
18:15 - 20:00 The Race to Petaflops: Getting it Right
Thomas Sterling, Caltech
Co-organized with Lysator, Linköping University Computer Society and their seminar series UppLYSning
in C4, building C
Everybody is welcome!

October 24

in Collegium
09:00 Registration
Coffe and Tea outside the auditorium
10:00 Inauguration Session
Welcome to NSC
Matts Karlsson, director NSC
Welcome to Linköping University
Bertil Andersson, rector LiU
NSC in a National Context
Anders Ynnerman, director SNIC
Technical Presentation of NSC's new supercomputer
Niclas Andersson, NSC
Inauguration of NSC's new teraflop-scale computer
Madeleine Lejonhufvud, Deputy Director General, Swedish Research Council
Hans Sandebring, Director General, SMHI
11:15 Workshop Keynote: Petaflop scale Computing with Linux Commodity Clusters
Thomas Sterling, Caltech
12:15 LUNCH
in Collegium restaurant, ground floor
Session 1: Applications
13:30 Parallel Computing: a route to complexity and reality in material simulations
Shiwu Gao, Dept. of Applied Physics, Chalmers Univ. of Technology
abstract slides (ppt)
14:00 Designing a cluster for geophysical fluid dynamics applications
Göran Broström, Earth Sciences Centre, Göteborgs Univ.
abstract slides (ppt)
14:30 Grid generation for Neuro-Mechanical Networks
Magnus Sethson, Dept. of Mechanical Engineering, Linköping Univ.
abstract slides (PDF)
15:00 BREAK
Coffee and Tea outside the auditorium
Session 2: Applications and GRID
15:30 Achieving Design Targets through Stochastic Optimization - Current practise in Automotive and Aerospace industries
Petter Sahlin, EASi Engineering
16:00 Experiences in parallelizing two geophysical models at SMHI
Tomas Wilhelmsson, SMHI
Fruit and refreshments
16:45 Grid Enabled Optimisation and Design Search for Engineering (GEODISE)
Simon J. Cox, School of Engineering. Sciences, Univ. of Southampton
abstract slides (ppt)
17:15 NorduGrid - a Nordic Grid
Mattias Ellert, Dept. of Radiation Sciences, Uppsala Univ.
abstract slides (PDF)
17:30 -
Tours of NSC and Sweden's fastest computer system
(every 20 minutes)
at NSC, building G, LiU - within walking distance from Collegium
19:30 DINNER
in Atmosfär at Konsert & Kongress, downtown, see map

October 25

Session 3: Portals and User Access
08:30 Transparent access to finite element applications using grid technology
Jonas Lindemann, Lunarc, Lund Univ.
abstract slides (ppt)
09:00 Röde Orm: a computational portal for numerical computations of Optical problems
Manuel Lopez Quiroga-Teixeiro, gridCore AB
09:30 Experiences in Management and in finding external users for the Hirmu Cluster
Michael Gindonis, Helsinki Institute of Physics, Technology Programme
10:00 BREAK
Coffee and Tea outside the auditorium
Session 4: Building Clusters
10:30 The HPC2N Super Cluster: From Bits and Pieces to a Benchmarked TOP100 Supercomputing System
Erik Elmroth, High Performance Computing Centre North (HPC2N), Umeå Univ.
11:00 Challenges in building large Linux clusters
Ole Holm Nielsen, Dept. of Physics, Technical Univ. of Denmark
abstract slides (PDF)
11:30 PIM and the Challenge to Linux for supercharged Clusters
Thomas Sterling
12:00 LUNCH
in Collegium restaurant, ground floor
Session 5: Vendor Solutions
13:00 From Beowulf to Professional Turnkey Solutions
Einar Rustad, SCALI
abstract slides (ppt)
13:30 Linux Cluster Solutions with IBM
Kathleen Bonadonna, IBM
14:00 Linux Clusters from HP for Scalable Scientific Computing
Martin Anthony Walker, HP
Fruit and refreshments
Session 6: Future
14:42 The future of x86 based High Performance Computing
Francesco Torricelli, AMD
15:42 Closing remarks
TBD = To Be Determined


Using Linux Clusters at NSC
Peter Kjellström, NSC

Who should attend:

  • Anyone interested in linux cluster usability
  • Current and future users of NSC cluster systems

Only the most basic unix familiarity will be assumed

This tutorial aims to introduce the participant to the NSC cluster environment (NCE). The primary purpose of the NCE is to make life easier for the users. This is done by integrating various components such as compilers and MPI-libraries.

Topics included:

  • Software environment overview
  • Compilers
  • Available MPI implementations
  • The Maui scheduler
  • Understanding system load
  • Compiling MPI applications
  • Running interactively
  • Running in batch

Building PC Clusters - putting the bits and pieces together
Lennart Karlsson, NSC

Who should attend:

  • Anyone curious about how a Beowulf cluster is constructed.
  • Anyone who is going to build a Beowulf cluster and would like some advice.

A basic understanding of IP networks and Unix-like systems will be assumed. (To actually build a cluster, you need also some Linux and network administration skills.)

Practical advice will be given on how to build a computing cluster, based on experiences at the Swedish National Supercomputer Centre.

Topics including:

  • Why build clusters?
  • Parallel jobs versus single-processor jobs.
  • The components and environment of a cluster. An overview.
  • What makes the cluster tick? The life of a parallel computing job.
  • Planning, buying and setting up your cluster.

Grid Computing on the NorduGrid Testbed: a hands-on tutorial (2-3 hours)
Balazs Konya, NorduGrid

Linux Clusters are the fundamental constituents of the Grids, whose ultimate goal is to provide "transparent access" to shared computing resources belonging to multiple administrative domains. The Grid can ease and facilitate the access to these (super)computing facilities.

The tutorial aims to give a "real life experience" of present-day Grid technologies by using the NorduGrid Testbed. No prior knowledge is assumed, the tutorial is open for everybody, HPC users are especially welcome.


  • short overview of the concept of Grid computing
  • available middleware solutions, the NorduGrid Toolkit
  • NorduGrid Testbed overview, Grid services, architecture
  • what do you need to use the (Nordu)Grid?
  • logging onto the Grid: the certificate & single-sign on, security issues
  • the "Hello world" on the Grid
  • overview of a Grid session: job submission, job monitoring, "output management"
  • resource discovery: what is there available on the Grid? The Information System
  • formulating a Grid job request: the eXtended Resource Specification Language (XRSL)
  • User Interface (command line tools) & built-in Resource Broker
  • data access on the Grid, replicas & storage elements
  • participants will be assisted in trying to "put their application onto the Grid"
  • future plans


The Race to Petaflops: Getting it Right
Thomas Sterling

The steady increase in performance of high end computing systems as reflected by the Top-500 list demonstrates an average performance gain of a factor of approximately 1.8X per year as measured by the Linpack benchmark over a baseline of almost a decade.

This apparent sustained rate of growth obscures the highly non-linear trends in the underlying system architectures. When nine years ago vector, SIMD, and SMP architectures dominated much of the HPC landscape, today almost all of the top performing systems are MPPs and commodity clusters (including Constellations) with the Japanese Earth Simulator the fastest general purpose system at 40 Teraflops peak comprising an MPP of vector microprocessors establishing the mid point (logarithmically) in the trans-Teraflops performance regime.

The implications of these trends is that Petaflops scale computing systems will become available at the beginning of the next decade but that the class of system architecture may have to be very different from the MPP and clusters systems of today.

This presentation describes some of the possible alternative system architectures that may drive computing in to the trans-Petaflops regime. In particular, hybrid technology and processor in memory (PIM) architectures will be examined in their various forms. Of equal importance is how such systems will address critical factors that contribute to performance degradation and inefficiency including latency, overhead, starvation, and contention. The talk will conclude with a brief discussion on the new Cray Cascade Petaflops computer project being sponsored by DARPA.

Keynote: Petaflop scale Computing with Linux Commodity Clusters
Thomas Sterling

Commodity cluster computing is the single fastest growing class of high performance computing system architecture. Almost half of the systems on the Top-500 list determined by the Linpack benchmark are commodity clusters (this includes Constellations). By far the most abundant family of commodity clusters are Linux clusters of low cost PCs such as Beowulf-class systems. Today in the United States one cluster is being constructed at Lawrence Livermore National Laboratory by Linux NetworX with a peak capability of 9.2 Teraflops and at four National Science Foundation sites a Grid of Linux clusters is being assembled with an aggregate peak performance of 11.6 Teraflops. It is likely that Linux clusters will lead the way to future performance goals. Then when and how will such systems ultimately achieve Petaflops scale performance for real world applications? This address will examine the technology, architecture, and software issues that will determine the roadmap leading to Linux clusters in the trans-Petaflops performance regime. Included in this presentation will be projections derived from the Semiconductor Industry Associations predictions for future chip technology as well as extrapolations taken directly from the data base of the Top-500 list. One important aspect of the future of commodity clusters is the ways in which the nodes comprising future clusters may evolve in response to technology opportunities and market forces. The conclusion of this talk will demonstrate that the likely timeframe for a Petaflops-scale Linux cluster is 2011-2012 at a cost of approximately $10 million.

Parallel Computing: a route to complexity and reality in material simulations
Shiwu Gao

This talk contains two related parts. In the first part, I will present our recent progress in parallelizing the WIEN package, the full-potential (linearized) Augmented Plane Waves (FP-(L)APW) method, which calculates materials properties from the ab initio electron structure calculations based on density functional theory. Both the parallelization scheme and the test results on IBM SP3 (PDC) and the new Linux cluster (HPC2N) will be given[1]. Comparison of performance and scaling on the two machines will be presented and discussed.

The second part of the talk presents a few applications of ab initio simulation methods (WIEN and VASP) to the materials and surface problems. The following examples will be given: i) Adsorption induced hydrogen bonding by CH group[2]; ii) Vibrational recognition of hydrogen-bonded water networks on a metal surface[3]; and iii) An electronic picture of hydrophilic and hydrophobic interactions at surfaces[4].

[1] Shiwu Gao, Linear-scaling parallelization of WIEN package with MPI (to be published).
[2] Shiwu Gao, J. R. Hahn, and W. Ho, Phys. Rev. Lett. (2002), submitted.
[3] Sheng Meng, L. F. Xu, E. G. Wang, and Shiwu Gao, Phys. Rev. Lett. (2002) in print.
[4] Sheng Meng, E. G. Wang, B. Kasemo, and S. Gao, Phys. Rev. Lett. (2002), submitted.

Designing a cluster for geophysical fluid dynamics applications
Göran Broström

Realistic simulations of oceanographic and atmospheric processes require a great deal of computer power. The codes we use are publicly available and are generally parallelized to run with, for instance, MPI. However, it should be noted that the basic physics of these codes imply that there is an intense exchange of data between cpus. Thus, the connection between cpus becomes critical for the computational speed of the cluster, and fast connection network (i.e., SCI cards) is needed for best performance. Further, the computational speed at a single processor is typically memory-bound, implying that fast memory buses should be used in clusters for geophysical fluid dynamics (GFD). In this presentation I will show some of the work we have done to design a 48 cpu cluster for GFD applications. The performance of the cluster will also be discussed.

Grid generation for Neuro-Mechanical Networks
Magnus Sethson

Network systems are gaining more and more interest within the area of mechanical systems engineering. Its natural relation to biological systems is obvious and also its fascinating properties to assemble simple elements into structures solving complex tasks. By combining mechanical engineering with neural networks we get a generic tool for creating mechanisms and variable structures that can be very flexible.

We are currently evaluating such behaviors and therefore developing a framework for dynamical simulations of such networks. The calculations take place using a network of simple elements or actuators. In the search for numerically stable simulation environments we have created a grid generation tool that tries to automatically generate a network grid suitable for such calculations. The NMN, Neuro-Mechanical Networks, can be used for a variety of applications from bio-mimicking to shape-changing airfoils.

The network is characterized by its random structure and the discrete length of each simple element. This relates very close to structure of tissues within our bodies, especially the human heart. Presented are the first tests of such grid generation tool using genetic algorithms for establishing the discrete length characteristics within some limits. The huge number of elements needed to get a relevant resolution in the tissues properties lead to large scale optimizations. The first numerical and scaling findings of a new library,, for genetic algorithms are presented when implemented on Linux clusters.

Achieving Design Targets through Stochastic Optimization - Current practise in Automotive and Aerospace industries
Petter Sahlin

The first full-scale stochastic automotive crash was run at BMW in 1997. Stochastic simulation requires extensive compute resources, at that time some 700 CPUs. Hence this, the method was rapidly adopted since it:

  • enables decisions based on simulation alone earlier in the development process
  • implements the occurance of uncertainty/scatter in input parameters
  • enables validation of physical and digital tests
  • defines the robustness of a complex design or a simulation
  • enables efficient optimisation of a complex design, system or simulation

Today the method is implemented on a large scale in several leading automotive and aerospace orgnizations. One of the reason to the increasing pace in adoption is a rapidly increasing access to cheap CPUs enabled by the availability of Linux clusters and Grid computing. This presentation offers a background on stochastic simulation and why it is used, as well as offering an update on how the method is deployed in areas such as CFD, crashworthiness and occupant simulation, NVH, durability, fatigue, mass reduction and multidisciplinary design optimization.

Experiences in parallelizing two geophysical models at SMHI
Tomas Wilhelmsson

The HIROMB model (HIgh Resolution Operational Model of the Baltic Sea) delivers daily forecasts of currents, temperature, salinity, water level, and ice conditions. HIROMB has been parallelized using a block-based grid decomposition and load balance is fine-tuned by assigning multiple blocks to each processor. Computation time in winter season is dominated by the model's visco-plastic ice dynamics component. Its parallelization was complicated by the need for a direct sparse matrix solver. We use ParMETIS to load-balance the ice solver in each time step.

The MATCH model (Multiscale Atmospheric Transport and CHemistry) is regional Eulerian air-pollution dispersion model. For nuclear emergency response applications, a Lagrangian particle model describes the initial dispersion of pollutants from point sources. MATCH has been parallelized with a same-source approach using a Fortran loop and index translation tool (FLIC) and a parallel runtime library (RSL), both developed at Argonne National Laboratory.

Grid Enabled Optimisation and Design Search for Engineering (GEODISE)
Prof. Simon J. Cox

GEODISE is developing grid-based seamless access to an intelligent knowledge repository, a state-of-the-art collection of optimisation and search tools, industrial strength analysis codes, and distributed computing and data resources.

Engineering design search and optimisation is the process whereby engineering modelling and analysis are exploited to yield improved designs. In the next 2-5 years intelligent search tools will become a vital component of all engineering design systems and will steer the user through the process of setting up, executing and post-processing design search and optimisation activities. Such systems typically require large-scale distributed simulations to be coupled with tools to describe and modify designs using information from a knowledge base. These tools are usually physically distributed and under the control of multiple elements in the supply chain. Whilst evaluation of a single design may require the analysis of gigabytes of data, to improve the process of design can require assimilation of terabytes of distributed data. Achieving the latter goal will lead to the development of intelligent search tools.

Our focus is on the use of computational fluid dynamics with BAE Systems, Rolls Royce, and Fluent. GEODISE is being developed by the Universities of Southampton, Oxford and Manchester in collaboration with other industrial partners working in the domains of hardware (Intel), software (Microsoft), systems integration (Compusys), knowledge technologies (Epistemics), and grid-middleware (Condor).

NorduGrid - a Nordic Grid
Mattias Ellert

The NorduGrid project started in May 2001 as a collaboration between the Nordic countries to establish a grid infrastructure in the region. By connecting several computer clusters located at different locations a computing grid is formed. Users can then submit requests for the execution of computational tasks to the grid and the task is transferred to one of the clusters on the grid that has the hardware and software required to do the job. The grid set up by the NorduGrid project has successfully been used by High Energy Physicists to do Monte Carlo simulations of particle collisions in the ATLAS detector at CERN.

Transparent access to finite element applications using grid technology
Jonas Lindemann

Making clusters and grids available for a wider audience is an important task. Using finite element software or in fact computational software in general on clusters and grids today, involves a several steps before a job can be executed. For many users this procedure is cumbersome and therefore the powerful resources of a GRID is not utilized. To facilitate a change the procedure must be made simpler and easier to use. To this end a concept cluster is currently being deveoped, using PHP and CORBA. The cluster will have web-based access for job-submission, monitoring and result retrieval.

Röde Orm: a computational portal for numerical computations of Optical problems
Manuel Lopez Quiroga-Teixeiro

At the Photonics Laboratory at Chalmers University of Technology a Linux cluster is used for intensive computations. Optical problems at that Department involve both stochastic based simulations for optical fiber systems and integrated laser structures having sizes of many wavelengths. Solutions using Monte Carlo methods and parallelism are implemented as main parts of this computational portal.

Experiences in Management and in finding external users for the Hirmu Cluster
Michael Gindonis

In the fall of 2000 the Technology Programme of the Helsinki institute of Physics received funding from the Ehrnrooth Foundation to build a modern PC Cluster for research purposes and in order to allow access to High Performance Computing to non-traditional users. This Paper/Presentation will cover observations in the following areas: Remote management of Staff, User support, needs and expectations. This presentation will also attempt to define criteria that lead to successful collaborations.

The HPC2N Super Cluster: From Bits and Pieces to a Benchmarked TOP100 Supercomputing System
Erik Elmroth

In Spring 2002, HPC2N built the first Swedish Linux cluster with supercomputer capacity. The cluster consists of 240 rack-mounted AMD MP2000+ processors, interconnected with a low latency, high bandwidth, 3D torus SCI network. The system is truly self-made by HPC2N, including the specification and building of all the individual nodes. The system has a peak performance of 800 Gflops/s and was ranked 94 on the TOP500 list of the worlds fastest computers in June 2002, with a HP Linpack benchmark result of 480.7 Gflops/s. This presentation includes a description of our system and the work to build it, as well as performance analyses for processors, dual nodes, network bandwidth, and full system scalability. Benchmark results presented include HP Linpack, NAS Parallel, STREAM, and Pallas MPI Benchmarks.

Challenges in building large Linux clusters
Ole Holm Nielsen

Linux clusters with hundreds of nodes pose a number of challenges which are absent on clusters with just tens of nodes. We have recently built a 480-node cluster using standard Pentium-4 office-PCs for a total peak performance of 2.1 TeraFLOPS.

This presentation will discuss the issues of shelf construction, cooling system, and how you cope with a truck-load of PC boxes. Our choice of networking technology and topology will be discussed. Automated Linux installation over the network is described, as is the kinds of servers that we have chosen for administration and NFS file-service. Finally, we discuss our batch production environment.

PIM and the Challenge to Linux for supercharged Clusters
Thomas Sterling

Two emergent architectures are establishing the likely directions for future high performance computing. These are scalable commodity clusters and processor in memory (PIM) technology. The potential merger of clusters and PIM presents an exciting opportunity to achieve unprecedented performance while improving performance to cost. Substituting PIM devices for at least part of the main memory of cluster nodes can dramatically enhance performance capabilities while providing acceleration for data intensive computational problems. However, a number of challenges impose barriers to achieving this opportunity. This talk will describe the architectural and software issues related to exploiting future generation PIM devices in commodity clusters.

From Beowulf to Professional Turnkey Solutions
Einar Rustad

Scali´s software technology was developed in parallel with the development of the very early market for clusters in the academia. The goal was to develop highly efficient and robust software for the two critical areas for making clusters a viable alternative to traditional supercomputers and cc-NUMA machines, cluster communication and cluster management. This enables both efficient execution of a wide range of applications and cost-effective operation and management for users and system administrators.

Clusters based on Scali´s software technology is now being offered to industrial users by large hardware vendors like HP and Dell in addition to local distributors and system integrators world-wide. The key asset inside Scali´s software solutions is intimate knowledge of processor, memory and interconnect architectures as well as an overall understanding of parallel applications and their requirements. Scali works closely with ISVs, HSVs and customers to obtain ultimate performance for the end-user applications.

Linux clusters from HP for scalable scientific computing
Martin Anthony Walker

Achieving high sustained application performance on compute clusters imposes hard requirements on the balance among the speed of the processors, and the bandwidth and latency of memory access and inter-node communication, as well as I/O performance. The usability of large scale clusters requires appropriate file systems and system software for cluster and workload management. HP's approach to these issues will be presented, with concrete examples from current installations based on Itanium 2 processors.

Linux Cluster Solutions with IBM
Kathleen Bonadonna

Linux is important to IBM. It is an integral part of Internet, is rapidly becoming the application development platform of choice, and is increasingly being used in high-performance computing. Over the last years, IBM has become the industry leader in providing Linux Solutions and a key part of the worldwide Linux community. From a Linux Cluster perspective, IBM has installed clusters across the world including many of the largest ones in existence today. IBM's focus is to offer fully integrated and tested clusters based on IBM's xSeries rack-optimized servers while providing greater flexibility, superior manageability, excellent price performance and the ability to create powerful, flexible solutions for high-performance computing. The talk will also cover IBM's future direction in Linux Clusters including blades and other exciting technology.

Niclas Andersson