Programme & Abstracts
|
October 23
|
Tutorials
|
| 13:15 - 15:00 |
Tutorial 1: Using Linux Clusters at NSC
Peter Kjellström, NSC
in R6, building C
|
outline |
| 13:15 - 15:00 |
Tutorial 2: Building PC-clusters - putting the bits and pieces together
Lennart Karlsson, NSC
in C3, building C
|
outline |
| 15:15 - 18:00 |
Tutorial 3: Grid Computing on the NorduGrid Testbed:
a hands on tutorial
Balazs Konya, Lund University
in C4, building C
|
outline |
Presented material and exercises |
|
Seminar
|
| 18:15 - 20:00 |
The Race to Petaflops: Getting it Right
Thomas Sterling, Caltech
Co-organized with Lysator,
Linköping University Computer Society
and their seminar series
UppLYSning
in C4, building C
Everybody is welcome!
|
abstract |
October 24
|
in Collegium
|
| 09:00 |
Registration
Coffe and Tea outside the auditorium
|
| 10:00 |
Inauguration Session |
|
Welcome to NSC
Matts Karlsson, director NSC
|
|
Welcome to Linköping University
Bertil Andersson, rector LiU
|
|
NSC in a National Context
Anders Ynnerman, director SNIC
|
|
Technical Presentation of NSC's new supercomputer
Niclas Andersson, NSC
|
|
Inauguration
of NSC's new teraflop-scale computer
Madeleine Lejonhufvud, Deputy Director General, Swedish Research Council
Hans Sandebring, Director General, SMHI
|
| |
| 11:15 |
Workshop Keynote:
Petaflop scale Computing with Linux Commodity Clusters
Thomas Sterling, Caltech
|
abstract |
| |
| 12:15 |
LUNCH
in Collegium restaurant, ground floor
|
| |
| Session 1: Applications |
| 13:30 |
Parallel Computing: a route to complexity
and reality in material simulations
Shiwu Gao,
Dept. of Applied Physics, Chalmers Univ. of Technology
|
abstract |
slides (ppt) |
| 14:00 |
Designing a cluster
for geophysical fluid dynamics applications
Göran Broström,
Earth Sciences Centre, Göteborgs Univ.
|
abstract |
slides (ppt) |
| 14:30 |
Grid generation for Neuro-Mechanical Networks
Magnus Sethson,
Dept. of Mechanical Engineering, Linköping Univ.
|
abstract |
slides (PDF) |
| |
| 15:00 |
BREAK
Coffee and Tea outside the auditorium |
| |
| Session 2: Applications and GRID |
| 15:30 |
Achieving Design Targets through Stochastic Optimization
- Current practise in Automotive and Aerospace industries
Petter Sahlin,
EASi Engineering
|
abstract |
| 16:00 |
Experiences in parallelizing two geophysical models at SMHI
Tomas Wilhelmsson,
SMHI
|
abstract |
| |
| 16:30 |
SHORT BREAK
Fruit and refreshments
|
| |
| 16:45 |
Grid Enabled Optimisation and
Design Search for Engineering (GEODISE)
Simon J. Cox,
School of Engineering. Sciences, Univ. of Southampton
|
abstract |
slides (ppt) |
| 17:15 |
NorduGrid - a Nordic Grid
Mattias Ellert,
Dept. of Radiation Sciences, Uppsala Univ.
|
abstract |
slides (PDF) |
| |
17:30 - 18:50 |
Tours of NSC and Sweden's fastest computer system
(every 20 minutes)
at NSC, building G, LiU
- within walking distance from Collegium
|
| |
| 19:30 |
DINNER
in Atmosfär at Konsert & Kongress, downtown, see
map
|
October 25
| Session 3: Portals and User Access |
| 08:30 |
Transparent access to finite element applications
using grid technology
Jonas Lindemann,
Lunarc, Lund Univ.
|
abstract |
slides (ppt) |
| 09:00 |
Röde Orm: a computational portal
for numerical computations
of Optical problems
Manuel Lopez Quiroga-Teixeiro,
gridCore AB
|
abstract |
| 09:30 |
Experiences in Management and in finding
external users for the Hirmu Cluster
Michael Gindonis,
Helsinki Institute of Physics, Technology Programme
|
abstract |
| |
| 10:00 |
BREAK
Coffee and Tea outside the auditorium |
| |
| Session 4: Building Clusters |
| 10:30 |
The HPC2N Super Cluster: From Bits and Pieces to
a Benchmarked TOP100 Supercomputing System
Erik Elmroth,
High Performance Computing Centre North (HPC2N), Umeå Univ.
|
abstract |
| 11:00 |
Challenges in building large Linux clusters
Ole Holm Nielsen,
Dept. of Physics, Technical Univ. of Denmark
|
abstract |
slides (PDF) |
| 11:30 |
PIM and the Challenge to Linux for supercharged Clusters
Thomas Sterling
|
abstract |
| |
| 12:00 |
LUNCH
in Collegium restaurant, ground floor
|
| |
| Session 5: Vendor Solutions |
| 13:00 |
From Beowulf to Professional Turnkey Solutions
Einar Rustad, SCALI
|
abstract |
slides (ppt) |
| 13:30 |
Linux Cluster Solutions with IBM
Kathleen Bonadonna, IBM
|
abstract |
| 14:00 |
Linux Clusters from HP for Scalable Scientific Computing
Martin Anthony Walker, HP
|
abstract |
| 14:30 |
SHORT BREAK
Fruit and refreshments
|
| Session 6: Future |
| 14:42 |
The future of x86 based High Performance Computing
Francesco Torricelli, AMD
|
| 15:42 |
Closing remarks
|
- (*)
-
TBD = To Be Determined
Tutorials
Using Linux Clusters at NSC
Peter Kjellström, NSC
Who should attend:
- Anyone interested in linux cluster usability
- Current and future users of NSC cluster systems
Only the most basic unix familiarity will be assumed
This tutorial aims to introduce the participant to the
NSC cluster environment (NCE). The primary purpose of the
NCE is to make life easier for the users. This is done by
integrating various components such as compilers and
MPI-libraries.
Topics included:
- Software environment overview
- Compilers
- Available MPI implementations
- The Maui scheduler
- Understanding system load
- Compiling MPI applications
- Running interactively
- Running in batch
Building PC Clusters - putting the bits and pieces together
Lennart Karlsson, NSC
Who should attend:
- Anyone curious about how a Beowulf cluster is constructed.
- Anyone who is going to build a Beowulf cluster and would like
some advice.
A basic understanding of IP networks and Unix-like systems will be
assumed. (To actually build a cluster, you need also some Linux
and network administration skills.)
Practical advice will be given on how to build a computing cluster, based
on experiences at the Swedish National Supercomputer Centre.
Topics including:
- Why build clusters?
- Parallel jobs versus single-processor jobs.
- The components and environment of a cluster. An overview.
- What makes the cluster tick? The life of a parallel computing job.
- Planning, buying and setting up your cluster.
Grid Computing on the NorduGrid Testbed:
a hands-on tutorial (2-3 hours)
Balazs Konya, NorduGrid
Linux Clusters are the fundamental constituents of the Grids, whose
ultimate goal is to provide "transparent access" to shared
computing resources belonging to multiple administrative domains.
The Grid can ease and facilitate the access to these (super)computing
facilities.
The tutorial aims to give a "real life experience" of present-day Grid
technologies by using the NorduGrid Testbed. No prior knowledge is assumed,
the tutorial is open for everybody, HPC users are especially welcome.
Outline:
- short overview of the concept of Grid computing
- available middleware solutions, the NorduGrid Toolkit
- NorduGrid Testbed overview, Grid services, architecture
- what do you need to use the (Nordu)Grid?
- logging onto the Grid: the certificate & single-sign on, security issues
- the "Hello world" on the Grid
- overview of a Grid session: job submission, job monitoring, "output
management"
- resource discovery: what is there available on the Grid? The Information
System
- formulating a Grid job request:
the eXtended Resource Specification Language (XRSL)
- User Interface (command line tools) & built-in Resource Broker
- data access on the Grid, replicas & storage elements
- participants will be assisted in trying to "put their application onto the
Grid"
- future plans
Abstracts
The Race to Petaflops: Getting it Right
Thomas Sterling
The steady increase in performance of high end computing systems as
reflected by the Top-500 list demonstrates an average performance gain
of a factor of approximately 1.8X per year as measured by the Linpack
benchmark over a baseline of almost a decade.
This apparent sustained rate of growth obscures the highly
non-linear trends in the underlying system architectures. When nine
years ago vector, SIMD, and SMP architectures dominated much of the
HPC landscape, today almost all of the top performing systems are MPPs
and commodity clusters (including Constellations) with the Japanese
Earth Simulator the fastest general purpose system at 40 Teraflops
peak comprising an MPP of vector microprocessors establishing the mid
point (logarithmically) in the trans-Teraflops performance regime.
The implications of these trends is that Petaflops scale computing
systems will become available at the beginning of the next decade but
that the class of system architecture may have to be very different
from the MPP and clusters systems of today.
This presentation describes some of the possible alternative system
architectures that may drive computing in to the trans-Petaflops
regime. In particular, hybrid technology and processor in memory (PIM)
architectures will be examined in their various forms. Of equal
importance is how such systems will address critical factors that
contribute to performance degradation and inefficiency including
latency, overhead, starvation, and contention. The talk will conclude
with a brief discussion on the new Cray Cascade Petaflops computer
project being sponsored by DARPA.
Keynote: Petaflop scale Computing with Linux Commodity Clusters
Thomas Sterling
Commodity cluster computing is the single fastest growing class of
high performance computing system architecture. Almost half of the
systems on the Top-500 list determined by the Linpack benchmark are
commodity clusters (this includes Constellations). By far the most
abundant family of commodity clusters are Linux clusters of low cost
PCs such as Beowulf-class systems. Today in the United States one
cluster is being constructed at Lawrence Livermore National Laboratory
by Linux NetworX with a peak capability of 9.2 Teraflops and at four
National Science Foundation sites a Grid of Linux clusters is being
assembled with an aggregate peak performance of 11.6 Teraflops. It is
likely that Linux clusters will lead the way to future performance
goals. Then when and how will such systems ultimately achieve
Petaflops scale performance for real world applications? This address
will examine the technology, architecture, and software issues that
will determine the roadmap leading to Linux clusters in the
trans-Petaflops performance regime. Included in this presentation will
be projections derived from the Semiconductor Industry Associations
predictions for future chip technology as well as extrapolations taken
directly from the data base of the Top-500 list. One important aspect
of the future of commodity clusters is the ways in which the nodes
comprising future clusters may evolve in response to technology
opportunities and market forces. The conclusion of this talk will
demonstrate that the likely timeframe for a Petaflops-scale Linux
cluster is 2011-2012 at a cost of approximately $10 million.
Parallel Computing: a route to complexity and reality in
material simulations
Shiwu Gao
This talk contains two related parts. In the first part, I will
present our recent progress in parallelizing the WIEN package, the
full-potential (linearized) Augmented Plane Waves
(FP-(L)APW) method, which calculates materials properties from
the ab initio electron structure calculations based on density
functional theory. Both the parallelization scheme and the test
results on IBM SP3 (PDC) and the new Linux cluster (HPC2N) will be
given[1]. Comparison of performance and scaling on the two machines
will be presented and discussed.
The second part of the talk presents a few applications of ab initio
simulation methods (WIEN and VASP) to the materials and surface
problems. The following examples will be given: i) Adsorption induced
hydrogen bonding by CH group[2]; ii) Vibrational recognition of
hydrogen-bonded water networks on a metal surface[3]; and iii) An
electronic picture of hydrophilic and hydrophobic interactions at
surfaces[4].
[1] Shiwu Gao, Linear-scaling parallelization of WIEN package with MPI (to
be published).
[2] Shiwu Gao, J. R. Hahn, and W. Ho, Phys. Rev. Lett. (2002), submitted.
[3] Sheng Meng, L. F. Xu, E. G. Wang, and Shiwu Gao, Phys. Rev. Lett. (2002)
in print.
[4] Sheng Meng, E. G. Wang, B. Kasemo, and S. Gao, Phys. Rev. Lett. (2002),
submitted.
Designing a cluster for geophysical fluid dynamics applications
Göran Broström
Realistic simulations of oceanographic and atmospheric processes
require a great deal of computer power. The codes we use are publicly
available and are generally parallelized to run with, for instance,
MPI. However, it should be noted that the basic physics of these codes
imply that there is an intense exchange of data between cpus. Thus,
the connection between cpus becomes critical for the computational
speed of the cluster, and fast connection network (i.e., SCI cards) is
needed for best performance. Further, the computational speed at a
single processor is typically memory-bound, implying that fast memory
buses should be used in clusters for geophysical fluid dynamics
(GFD). In this presentation I will show some of the work we have done
to design a 48 cpu cluster for GFD applications. The performance of
the cluster will also be discussed.
Grid generation for Neuro-Mechanical Networks
Magnus Sethson
Network systems are gaining more and more interest within the area
of mechanical systems engineering. Its natural relation to biological
systems is obvious and also its fascinating properties to assemble
simple elements into structures solving complex tasks. By combining
mechanical engineering with neural networks we get a generic tool for
creating mechanisms and variable structures that can be very flexible.
We are currently evaluating such behaviors and therefore developing
a framework for dynamical simulations of such networks. The
calculations take place using a network of simple elements or
actuators. In the search for numerically stable simulation
environments we have created a grid generation tool that tries to
automatically generate a network grid suitable for such
calculations. The NMN, Neuro-Mechanical Networks, can be used for a
variety of applications from bio-mimicking to shape-changing airfoils.
The network is characterized by its random structure and the
discrete length of each simple element. This relates very close to
structure of tissues within our bodies, especially the human
heart. Presented are the first tests of such grid generation tool
using genetic algorithms for establishing the discrete length
characteristics within some limits. The huge number of elements needed
to get a relevant resolution in the tissues properties lead to large
scale optimizations. The first numerical and scaling findings of a new
library, http://gadesignlib.sourceforge.net,
for genetic algorithms are presented when implemented on Linux
clusters.
Achieving Design Targets through Stochastic Optimization
- Current practise in Automotive and Aerospace industries
Petter Sahlin
The first full-scale stochastic automotive crash was run at BMW in
1997. Stochastic simulation requires extensive compute resources, at
that time some 700 CPUs. Hence this, the method was rapidly adopted
since it:
- enables decisions based on simulation alone earlier in the development
process
- implements the occurance of uncertainty/scatter in input parameters
- enables validation of physical and digital tests
- defines the robustness of a complex design or a simulation
- enables efficient optimisation of a complex design, system or simulation
Today the method is implemented on a large scale in several leading
automotive and aerospace orgnizations. One of the reason to the
increasing pace in adoption is a rapidly increasing access to cheap
CPUs enabled by the availability of Linux clusters and Grid
computing. This presentation offers a background on stochastic
simulation and why it is used, as well as offering an update on how
the method is deployed in areas such as CFD, crashworthiness and
occupant simulation, NVH, durability, fatigue, mass reduction and
multidisciplinary design optimization.
Experiences in parallelizing two geophysical models at SMHI
Tomas Wilhelmsson
The HIROMB model (HIgh Resolution Operational Model of the Baltic
Sea) delivers daily forecasts of currents, temperature, salinity,
water level, and ice conditions. HIROMB has been parallelized using a
block-based grid decomposition and load balance is fine-tuned by
assigning multiple blocks to each processor. Computation time in
winter season is dominated by the model's visco-plastic ice dynamics
component. Its parallelization was complicated by the need for a
direct sparse matrix solver. We use ParMETIS to load-balance the ice
solver in each time step.
The MATCH model (Multiscale Atmospheric Transport and CHemistry) is
regional Eulerian air-pollution dispersion model. For nuclear
emergency response applications, a Lagrangian particle model describes
the initial dispersion of pollutants from point sources. MATCH has
been parallelized with a same-source approach using a Fortran loop and
index translation tool (FLIC) and a parallel runtime library (RSL),
both developed at Argonne National Laboratory.
Grid Enabled Optimisation and Design Search for Engineering (GEODISE)
Prof. Simon J. Cox
GEODISE is developing grid-based seamless access to an intelligent
knowledge repository, a state-of-the-art collection of optimisation and
search tools, industrial strength analysis codes, and distributed
computing and data resources.
Engineering design search and optimisation is the process whereby
engineering modelling and analysis are exploited to yield improved
designs. In the next 2-5 years intelligent search tools will become a
vital component of all engineering design systems and will steer the user
through the process of setting up, executing and post-processing design
search and optimisation activities. Such systems typically require
large-scale distributed simulations to be coupled with tools to describe
and modify designs using information from a knowledge base. These tools
are usually physically distributed and under the control of multiple
elements in the supply chain. Whilst evaluation of a single design may
require the analysis of gigabytes of data, to improve the process of
design can require assimilation of terabytes of distributed data.
Achieving the latter goal will lead to the development of intelligent
search tools.
Our focus is on the use of computational fluid dynamics with BAE
Systems, Rolls Royce, and Fluent. GEODISE is being developed by the
Universities of Southampton, Oxford and Manchester in collaboration with
other industrial partners working in the domains of hardware (Intel),
software (Microsoft), systems integration (Compusys), knowledge
technologies (Epistemics), and grid-middleware (Condor).
NorduGrid - a Nordic Grid
Mattias Ellert
The NorduGrid project started in May 2001 as a collaboration
between the Nordic countries to establish a grid infrastructure in the
region. By connecting several computer clusters located at different
locations a computing grid is formed. Users can then submit requests
for the execution of computational tasks to the grid and the task is
transferred to one of the clusters on the grid that has the hardware
and software required to do the job. The grid set up by the NorduGrid
project has successfully been used by High Energy Physicists to do
Monte Carlo simulations of particle collisions in the ATLAS detector
at CERN.
Transparent access to finite element applications using grid technology
Jonas Lindemann
Making clusters and grids available for a wider audience is an
important task. Using finite element software or in fact computational
software in general on clusters and grids today, involves a several
steps before a job can be executed. For many users this procedure is
cumbersome and therefore the powerful resources of a GRID is not
utilized. To facilitate a change the procedure must be made simpler
and easier to use. To this end a concept cluster is currently being
deveoped, using PHP and CORBA. The cluster will have web-based access
for job-submission, monitoring and result retrieval.
Röde Orm: a computational portal for numerical computations
of Optical problems
Manuel Lopez Quiroga-Teixeiro
At the Photonics Laboratory at Chalmers University of Technology a
Linux cluster is used for intensive computations. Optical problems at
that Department involve both stochastic based simulations for optical
fiber systems and integrated laser structures having sizes of many
wavelengths. Solutions using Monte Carlo methods and parallelism are
implemented as main parts of this computational portal.
Experiences in Management and in finding external
users for the Hirmu Cluster
Michael Gindonis
In the fall of 2000 the Technology Programme of the Helsinki
institute of Physics received funding from the Ehrnrooth Foundation to
build a modern PC Cluster for research purposes and in order to allow
access to High Performance Computing to non-traditional users. This
Paper/Presentation will cover observations in the following areas:
Remote management of Staff, User support, needs and expectations. This
presentation will also attempt to define criteria that lead to
successful collaborations.
The HPC2N Super Cluster: From Bits and Pieces to a
Benchmarked TOP100 Supercomputing System
Erik Elmroth
In Spring 2002, HPC2N built the first Swedish Linux cluster with
supercomputer capacity. The cluster consists of 240 rack-mounted AMD
MP2000+ processors, interconnected with a low latency, high bandwidth,
3D torus SCI network. The system is truly self-made by HPC2N,
including the specification and building of all the individual nodes.
The system has a peak performance of 800 Gflops/s and was ranked 94 on
the TOP500 list of the worlds fastest computers in June 2002, with a
HP Linpack benchmark result of 480.7 Gflops/s. This presentation
includes a description of our system and the work to build it, as well
as performance analyses for processors, dual nodes, network bandwidth,
and full system scalability. Benchmark results presented include HP
Linpack, NAS Parallel, STREAM, and Pallas MPI Benchmarks.
Challenges in building large Linux clusters
Ole Holm Nielsen
Linux clusters with hundreds of nodes pose a number of challenges which
are absent on clusters with just tens of nodes. We have recently built a
480-node cluster using standard Pentium-4 office-PCs for a total peak
performance of 2.1 TeraFLOPS.
This presentation will discuss the issues of shelf construction,
cooling system, and how you cope with a truck-load of PC boxes.
Our choice of networking technology and topology will be discussed.
Automated Linux installation over the network is described,
as is the kinds of servers that we have chosen for administration
and NFS file-service. Finally, we discuss our batch production
environment.
PIM and the Challenge to Linux for supercharged Clusters
Thomas Sterling
Two emergent architectures are establishing the likely directions
for future high performance computing. These are scalable commodity
clusters and processor in memory (PIM) technology. The potential
merger of clusters and PIM presents an exciting opportunity to achieve
unprecedented performance while improving performance to
cost. Substituting PIM devices for at least part of the main memory of
cluster nodes can dramatically enhance performance capabilities while
providing acceleration for data intensive computational problems.
However, a number of challenges impose barriers to achieving this
opportunity. This talk will describe the architectural and software
issues related to exploiting future generation PIM devices in
commodity clusters.
From Beowulf to Professional Turnkey Solutions
Einar Rustad
Scali´s software technology was developed in parallel with the
development of the very early market for clusters in the academia.
The goal was to develop highly efficient and robust software for the
two critical areas for making clusters a viable alternative to
traditional supercomputers and cc-NUMA machines, cluster communication
and cluster management. This enables both efficient execution of a wide
range of applications and cost-effective operation and management for
users and system administrators.
Clusters based on Scali´s software technology is now being offered to
industrial users by large hardware vendors like HP and Dell in
addition to local distributors and system integrators world-wide.
The key asset inside Scali´s software solutions is intimate knowledge
of processor, memory and interconnect architectures as well as an
overall understanding of parallel applications and their requirements.
Scali works closely with ISVs, HSVs and customers to obtain ultimate
performance for the end-user applications.
Linux clusters from HP for scalable scientific computing
Martin Anthony Walker
Achieving high sustained application performance on compute
clusters imposes hard requirements on the balance among the speed of
the processors, and the bandwidth and latency of memory access and
inter-node communication, as well as I/O performance. The usability
of large scale clusters requires appropriate file systems and system
software for cluster and workload management. HP's approach to these
issues will be presented, with concrete examples from current
installations based on Itanium 2 processors.
Linux Cluster Solutions with IBM
Kathleen Bonadonna
Linux is important to IBM. It is an integral part of Internet, is
rapidly becoming the application development platform of choice, and
is increasingly being used in high-performance computing. Over the
last years, IBM has become the industry leader in providing Linux
Solutions and a key part of the worldwide Linux community. From a
Linux Cluster perspective, IBM has installed clusters across the world
including many of the largest ones in existence today. IBM's focus is
to offer fully integrated and tested clusters based on IBM's xSeries
rack-optimized servers while providing greater flexibility, superior
manageability, excellent price performance and the ability to create
powerful, flexible solutions for high-performance computing. The talk
will also cover IBM's future direction in Linux Clusters including
blades and other exciting technology.
|