LCSC 2006
October 17-18, 2006
Hosted by
National Supercomputer Centre (NSC)
Linköping University, Sweden

Sponsored by  








Printable Version

LCSC 2005
LCSC 2004
LCSC 2003
LCSC 2002
LCSC 2001
LCSC 2000

Programme for the 7th Annual Workshop on Linux Cluster for Super Computing

October 17


Two tutorials are offered this year:

T2 - The Gaussian 03 tutorial offers a unique opportunity for the quantum chemical community in Sweden to get an in-depth presentation of the single-most used quantum chemistry code of today.

T3 - The Efficient parallel applications on NUMA machines tutorial provides a presentation of the Altix supercomputer with special focus on Sweden largest shared memory resource Mozart at NSC.

The Using NSC systems tutorial (T1) is cancelled.

For detailed information on the tutorials, please click on the links above.

09:30 Registration
 Registration for the morning tutorials (T2 and T3) in the open area outside room
BL33, B-house (entrance 23), Linköping University.
10:00 Morning session
  T2 in room BL33, B-huset (entrance 23).
T3 in room BL34, B-huset (entrance 23).
13:00 Lunch
  The meal is complementary for T2 and T3, and will be served at Kårallen.
14:00 Afternoon session
  T2 in rooms SU10 and SU11, B-huset (entrance 27).
T3 in room BL34, B-huset (entrance 23).
18:00 Evening seminar
Linux on different architectures, a comparison of Cell, Intel, Opteron, and Blue Gene
Nils Smeds, IBM

Co-organized with Lysator, Linköping University Computer Society
and part of their seminar series UppLYSning in Visionen, B-house,
entrance 27, Valla Campus, Linköping University.
Everybody is welcome! (no fee)
19:30 Dinner at Munkkällaren, downtown.
Läroverksgatan 7, Linköping. [map]
Sponsored by IBM.

October 18


The workshop seminars will be given in auditorium Visionen in the B-house, entrance 27, with wireless internet connection. [map]

8:30 Registration
Outside the auditorium.
Chair: Sven Stafström
9:00 Welcome
Sven Stafström, Director NSC
9:15 Multi-core Beowulf Clusters for Petaflops-scale Computing
Thomas Sterling, Louisiana State University
abstract slides
10:00 Multicore from an Application's Perspective
Erik Hagersten, Uppsala University
abstract slides
10:30 BREAK Coffee and Tea outside the auditorium
Chair: Erik Hagersten
11:00 A Shared Memory Programming Technique Applied to Computational Chemistry Programs
Roberto Gomperts, SGI
abstract slides
11:30 Using FPGAs in Supercomputing - Reconfigurable Supercomputing
Stefan Möhl, Mitrionics
abstract slides
12:00 From gaming to serious supercomputing fun
Juan Jose Porta, IBM
12:30 LUNCH in Kårallen
Chair: Peter Münger
14:00 NDGF - a Joint Nordic Grid Facility
Lars Fischer, NDGF
abstract slides
14:30 A new 7 TFLOP cluster in NOTUR
Cyril Banino-Rokkones, NTNU, Norway
abstract slides
15:00 HPC facilities at CSC, the Finnish IT Center for Science
Juha Fagerholm, CSC, Finland
abstract slides
15:30 BREAK Coffee and Tea outside the auditorium
Chair: Niclas Andersson
16:00 Experiences from Lustre deployment at NSC
Peter Kjellström, NSC
16:30 dCache, the Peta-Scale Storage Element
Patrick Fuhrmann, DESY, Germany
abstract slides
17:00 The Swedish HPC landscape
Sverker Holmgren, Director SNIC
abstract slides
17:30 Closing


Multi-core Beowulf Clusters for Petaflops-scale Computing

Professor Thomas Sterling
Department of Computer Science
Center of Computation and Technology
Louisiana State University

The continuing opportunity presented by semiconductor technology trends characterized as Moore's Law also imposes significant challenges to processor, system, and software designers. The point of diminishing returns has been reached in the exploitation of logic complexity in processor design and this combined with severe problems related to power consumption has forced the industry to the strategy of integrated multiple processor cores on a single die. This "multi-core" methodology is a dramatic change from prior conventional practices and demands for the first time that the mainstream hardware and software community embrace parallel processing. Commodity Linux clusters such as low cost Beowulf class systems must necessarily integrate multi-core components both as homogeneous and heterogeneous structures. It can be anticipated that most commercial processing chips will incorporate eight cores before the end of the decade and with additional heterogeneous structures integrating accelerators, this degree of parallelism is already being achieved. Large commodity clusters in the hundreds of Teraflops range are already being planned and Petaflops-scale clusters will be deployed in the early part of the next decade if not before. While promising to continue the growth of peak performance, multi-core components will aggravate the challenge of achieving effective sustained performance for all but the most trivial of distributed application algorithms. Cache fragmentation, the memory wall, and system-wide latency will all exacerbate cluster operation. ParalleX is an innovative model of computation being developed to address these challenges and take advantage of future multi-core clusters as they scale to Petaflops capability. ParalleX replaces the dominant communicating sequential processes model (e.g. MPI) with one based on message-driven split-phase transaction multi-threaded processing. It eliminates the use of global barriers with local futures-based synchronization. ParalleX computing exhibits intrinsic latency hiding while exposing abundant parallelism, especially for sparse data structures with embedded meta-data such as directed graphs. This talk will discuss the current trends towards Petascale multi-core Beowulf clusters and present the innovative ParalleX model of computation that may provide a framework for addressing the many challenges implied by these new system architectures.

Linux on different architectures, a comparison of Cell, Intel, Opteron, and Blue Gene

Nils Smeds
IBM Deep Computing
Stockholm, Sweden

IBM has showed a long lasting dedication to Linux over the years. Today, Linux is a supported alternative on all IBM platforms from laptops and workstations to supercomputers such as BlueGene and "RoadRunner". The newest released platform is the Cell Broadband Engine - a CPU co-developed with Sony and Toshiba. This talk will focus on the Cell BE, its architecture, application areas and how you can get started programming it.

Multicore from an Application's Perspective

Professor Erik Hagersten
Computer Architecture
Uppsala University, Sweden

The multicore revolution is driven by the hardware vendors in a desperate attempt to continue their quest for performance. While this strategy has proven successful in some application areas, the performance footprint offered by multicore technology is disruptive in many areas, such cache size per thread, intra-thread communication cost, memory bandwidth per thread and the choice of either capacity or capability computing for increased performance. In this talk we will identify these disruptive technology areas and their corresponding application challenges as well as discuss some possible ways of overcoming these problems. We show how these fairly straight-forward techniques can be used to speed-up the performance of Gauss-Seidel by a factor three.

A Shared Memory Programming Technique Applied to Computational Chemistry Programs

Dr Roberto Gomperts
Silicon Graphics Inc.

A parallelization technique for shared memory (OpenMP) that we have used to parallelize sequential programs will be presented. This technique has been applied successfully in the past to various computational chemistry programs like Gaussian (Quantum Chemistry) and Amber and Charmm (Molecular Dynamics).

Using FPGAs in Supercomputing - Reconfigurable Supercomputing

Stefan Möhl
Co-founder, VP, CTO, Mitrionics AB, Sweden

Using FPGAs to accelerate computations has long been a topic for researchers seeking ways to push the limits of supercomputing. Today FPGA Supercomputing generates more interest than ever, with major system vendors offering FPGA-equipped computers off the shelf, thus making the technology so much more accessible to a wider audience. But putting FPGAs in supercomputers radically changes the playing field. Traditional hardware design methods are no longer applicable for several reasons. This talk is a presentation of how Mitrionics makes it possible to run software in FPGAs by putting a processor in the FPGA, allowing the user to program the processor instead of designing an electronic circuit to place in the FPGA.

From gaming to serious supercomputing fun

Juan Jose Porta
IBM Boeblingen Labs, Germany

The Cell Broadband Engine (CBE, a.k.a. Cell microprocessor) has been jointly developed by IBM, Sony and Toshiba. The Cell Broadband Architecture is intended to be scalable through microprocessor integration of parallel vector processing, where a general-purpose Power processor core is interconnected with eight special-purpose streaming SIMD cores ("Synergistic Processing Elements" or SPE). The first major commercial application of Cell is in Sony's upcoming PlayStation 3 game console. Cell was developed to be a general purpose processor and also a multimedia processor.

Exploiting the CBE as a standard Power Architecture processor with tightly attached (on-chip) SIMD acceleration capabilities opens up a non-disruptive migration path to significantly improve the price/performance competitiveness (Teraflop per €, Watts and/or floorspace) of Power blades and clusters. The accelerator-based programming model has the additional attractiveness to leverage the Linux on Power and VMX ecosystems while opening up opportunities to push selected applications into the supercomputing realm.

Shifting computational work from the main PPE processors to SPEs (by running special subroutines that take over portions of the application) is a straighforward extension of current VMX tuning techniques, enabling users to solve complex problems in less time without having to deal with the complexity of reconfigurable computing techniques like FPGAs. HPC segments that can potentially benefit from the CBE capabilities include astrophysics, bioinformatics, computational chemistry, medical imaging, seismic processing, climate modelling, computational fluid dynamics, financial engineering, quantum chromodynamics and other compute-intensive areas.

This presentation will discuss the concepts of the new Cell Broadband Engine and its impact on software development and performance. Operational environment and programming models concepts, as well as first performance results for different benchmarks and applications will be presented.

NDGF - a Joint Nordic Grid Facility

Lars Fischer
NDGF, Copenhagen, Denmark

NDGF - the Nordic Data Grid Facility - is a production grid facility that leverages existing, computational resources and grid infrastructures in the Nordic countries. NDGF was established in the spring of 2006. The mission of NDGF is to ensure that researchers in the Nordic countries can create and participate in computational challenges and e-Science projects of scope and size unreachable for the national research groups alone. NDGF facilitates use of computational and data storage resources, undertakes middleware development and application integration, and does project management. The talk will elaborate on the motivation and organization of NDGF, give a status for NDGF, and provide some early use cases.

A new 7 TFLOP cluster in NOTUR

Dr Cyril Banino-Rokkones
Dept. of Computer & Information Science (IDI)
NTNU Trondheim, Norway

From the beginning of November, a new 7.5 TFLOPS cluster will be available in NOTUR. The cluster is based on IBM Power5+ architecture with IBM Federation technology as interconnect. The new computing resource will almost triple NOTUR's total computing power. It will be used for operational weather forecast, CFD, construction mechanics, and physics calculations. The first part of the talk is a presentation of the system, while the second part reports promising uniprocessor experiments performed on a provisory test-bed installation.

HPC facilities at CSC, the Finnish IT Center for Science

Juha Fagerholm
CSC, Helsinki, Finland

CSC has recently decided on new facilities for HPC resources in Finland. The aim was to get both high-end capacity and price-performance capacity. The selection criteria were overall performance, system functionality, total costs, and cooperation aspects. The systems will be installed in stages during 2006-2008 and they should be efficient enough to serve the Finnish computational science community for the next few years. Performance was evaluated using several application benchmarks carefully selected to cover different fields of computational science that are strong in Finland. Most of the resources available will be allocated to research groups based on their research proposals. The Nordic Grand Challenge Survey will be used when choosing grand challenge projects for the supercomputer. The Nordic computational Grand Challenge Survey has been initiated to collect the ongoing large-scale computing projects or needs in the next years in the Nordic countries.

Experiences from Lustre deployment at NSC

Peter Kjellström
National Supercomputer Centre, Sweden

Lustre is an open source parallel file system developed by Cluster Filsystems (CFS). Lustre was designed for use in HPC environments and is deployed on several of the largest Machines in the world. This talk will contain both an introduction to Lustre (general information about the file system) and a description of what we at NSC have done (configurations, goals, what the users thought, what broke and what we will do next...).

dCache, the Peta-Scale Storage Element

Patrick Fuhrmann
DESY, Germany

With the increasing size of modern experiments, like the upcoming LCG accelerator at CERN, and the political demand of governments to spend money locally as well as to provide computing infrastructure to their national laboratories, the requirements of data storage systems have significantly increased. Beside the traditional ability to store data on disk or tape devices, Storage Systems presently have to provide interfaces to wide area transfer protocols as well as to protocols shaping the bandwidth of transfers and managing local storage. Moreover they need to offer a set of POSIX like access protocols in order to allow analysis jobs on their attached computing farms to get random, low latency access to datasets. High demanding analysis might even make it necessary to replicate often used datasets to improve overall throughput. In addition, depending on the storage strategy of the particular experiment, some laboratories need to persistently store data in tertiary storage which requires smart scheduling of data streams between disk and tape storage. dCache, one of the official LCG Storage Elements, which has proven to cope with these peta-scale storage requirements will be introduced in this presentation.

The Swedish HPC landscape

Sverker Holmgren
Director SNIC
Uppsala University, Sweden

The Swedish metacentre for HPC, SNIC (Swedish National Infrastructure for Computing), has six member centres located to provide strong national HPC services. SNIC has recently produced a landscape document that puts the Swedish HPC infrastructure in the context of the current trends in computational sciences as well as service and hardware developments. This document is presented, including an analysis of what type of HPC resources that are needed to provide Swedish researchers with a competitive HPC infrastructure. Road maps outlining how SNIC will implement the landscape are also described, giving blueprints for the future of the Swedish HPC landscape for the next three years.