LCSC
5th Annual Workshop on
Linux Clusters for Super Computing


October 18-21, 2004
Hosted by
National Supercomputer Centre (NSC)
Linköping University, Sweden

NGN
Nordic Grid Neighbourhood
Workshop


Sponsored by

Welcome

Announcement

Speakers

Programme
& Abstracts

Tutorials

Vendor Track

Exhibition

Registration

Program
Committee

Hotel
Information &
Reservation

Travel
Information

Contact
Information

Printable Version

LCSC 2003
LCSC 2002
LCSC 2001
LCSC 2000

Programme in Detail
LCSC - Abstracts
NGN - Abstracts

Most activities is taking place on the first floor in Collegium Conference Centre, Mjärdevi Science Park, Linköping.

Lunches will be served in the restaurant on ground floor in Collegium. Coffee and tea will be served outside the auditorium in the exhibition area during breaks.

The exhibition will start at 10:00 on Monday and last until 16:00 on Tuesday.

Wireless will be available in the auditorium.

October 18

Tutorials

08:30 Registration
Tutorials
You will find the full tutorial schedule and description in a separate tab in the menu to the left.

5th Annual Workshop on Linux Cluster for Super Computing

12:00 Registration
12:30 LUNCH
13:30 Opening
Welcome
Sven Stafström, Director, NSC
Bjørn Hafskjold, Manager, NOTUR
Ulf Nilsson, Vice Dean, Linköping Institute of Technology, LiU
Niclas Andersson, LCSC local organization
Welcome to Linköping University
14:00 LCSC Keynote
Cluster Computing: You've come a long way in a short time
Jack Dongarra, University of Tennessee & Oak Ridge National Laboratory
abstract slides
 
15:00 BREAK Coffee and Tea outside the auditorium
 
Session #1
15:30 Digital brain atlasing: the marriage of neuroinformatics and eSciences
Jan G. Bjålie, University of Oslo
16:00 Management of deep memory hierarchies - recursive blocking and hybrid data structures for dense matrix computations
Bo Kågström, Umeå University
abstract slides
16:30 The BlueGene/L Supercomputer and LOFAR/LOIS
Bruce Elmegreen, IBM Watson Research Centre
abstract slides
17:00 Scalable Algorithms for the Solutions of Large Sparse Linear Systems
Jacko Koster, University of Bergen
slides
 
19:00 DINNER at Hotel Ekoxen, downtown, see map.


October 19

Session #2
08:30 SCI Socket: The fastest socket on earth and the impact on stoarage and applications
Atle Vesterkjær, Dolphin Interconnect Solution Inc.
abstract slides
09:00 Current Status of InfiniBand for HPC
Peter Kjellström, Linköping University
slides
09:30 Presentation of PDC's New Machine - Technology and Benchmarks
Per Öster & Ulf Andersson, Royal Institute of Technology
 
10:00 BREAK Coffee and Tea outside the auditorium
 
Session #3
10:30 Using Linux Clusters for Full-Scale Simulation of Cardiac Electrophysiology
Xing Cai, Simula Research Laboratory
abstract slides
11:00 Bringing Space On Line: High-Performance Computing for a Distributed Space Probing Sensor Network
Lars K. S. Daldorff, Uppsala University
abstract slides
11:30 EVERGROW - probing the Internet of 2025
Erik Aurell, SICS and Royal Institute of Technology
slides
 
12:00 LUNCH in Collegium restaurant, ground floor
(LCSC programme committee meeting during lunch)
 
Session #4
13:00 Application Performance on High-End and Commodity-class Computers
Martyn F. Guest, CLRC Daresbury Laboratory
abstract slides
13:45 Linux Performance Analysis Tools: Parallel, Serial and I/O Performance Characterization
Philip J. Mucci & Per Ekman, Royal Institute of Technology
abstract slides
14:15 MPI Microbenchmarks: Misleading and Dangerous
Greg Lindahl, Pathscale Inc.
 
14:45 BREAK Coffee and Tea outside the auditorium
 
Session #5
15:15 HPC4U: Closing the gap between Resource Management and the Next Generation Grid
Matthias Hovestadt, University of Paderborn
abstract
15:45 TetSplat: Interactive Visualization of Huge Tetrahedral Meshes
Ken Museth, Linköping University
abstract
16:15 GRIA - Grid Resources for Industrial Applications
Steve Taylor, IT innovation, UK
Antonella Frigerio, CESI, IT
slides #1
slides #2
 
16:45 LCSC closing remarks

Evening Seminar

18:00 BlueGene: Innovations in parallel computing
Bruce Elmegreen, IBM Watson Research Centre
Co-organized with Lysator, Linköping University Computer Society and their seminar series UppLYSning in Visionen, building B. Everybody is welcome! (no fee)

October 20

Nordic Grid Neighbourhood workshop

08:30 Registration
09:00 Welcome to NGN
Farid Ould-Saada, University of Oslo
slides
National Grid Initatives
09:20 On the St.Petersburg state university computing centre and the 1st results in the GRID applications and data challenge for ALICE
Yuri Galyuck, St.Petersburg State University
slides
The first year of the Estonian Grid
Andi Hektor, NICPB
slides
Grid activities in Aalborg
Henrik T. Jensen, Aalborg University
slides
 
10:00 BREAK Coffee and Tea outside the auditorium
 
10:30 Finnish Grid Activities
Michael Kustaa Gindonis, Helsinki Institute of Physics
slides
Norgrid Activity
Jacko Koster, University in Bergen
slides
Baltic Grid Conference and other Lithuanian activities
Aleksandr Konstantinov, University of Oslo
slides
TBA (*)
Olle Mulmo, Royal Institute of Technology
slides
 
12:00 LUNCH in Collegium restaurant, ground floor
 
Middleware and Applications
13:00 Activities and Perspectives of IHPC&IS
Vladimir Korkhov, Institute for High Performance Computing and Information Systems
abstract slides
Allocation Enforcement in Swegrid using the SweGrid Accounting System (SGAS)
Peter Gardfjäll, Umeå University
Application Portal
Jonas Lindemann, Lund University
slides
gLite, the next generation middleware for Grid computing
Oxana Smirnova, Lund University
slides
St.Petersburg State University - scientific and communication links in St.Petersburg region and some future plans of GRID applications
Grigori Feofilov, St.Petersburg State University
slides
 
14:30 BREAK Coffee and Tea outside the auditorium
 
15:00 Round table discussion
organisation, goals, program, next workshop
 
16:00 NGN closing remarks
Farid Ould-Saada, Oslo University

(*)
TBA = To Be Annonced
[...]
Title is missing. Keywords give a hint of possible contents.

LCSC Abstracts

Cluster Computing: You've come a long way in a short time
Jack Dongarra, University of Tennessee & Oak Ridge National Laboratory

In last 50 years, the field of scientific computing has undergone rapid change — we have experienced a remarkable turnover of technologies, architectures, vendors, and the usage of systems. Despite all these changes, the long-term evolution of performance seems to be steady and continuous.

The acceptance of parallel systems not only for engineering applications but also for new commercial applications especially for database applications emphasized different criteria for market success such as stability of system, continuity of the manufacturer and price/performance. Due to these factors and the consolidation in the number of vendors in the market hierarchical systems build with components designed for the broader commercial market are currently replacing homogeneous systems at the very high end of performance. Clusters build with components of the shelf also gain more and more attention and today have a dominant position in the Top500.

In this talk we will look at the some of the existing and planned high performance computer architectures and look at the interconnections schemes they are using.

Management of deep memory hierarchies - recursive blocking and hybrid data structures for dense matrix computations
Bo Kågström, Umeå University

Matrix computations are both fundamental and ubiquitous in computational science and its vast application areas. Along with the development of more advanced computer systems with complex memory hierarchies, there is a continuing demand for new algorithms and library software that efficiently utilize and adapt to new architecture features. In this presentation, we review some of the recent advances made by applying the paradigm of recursion to dense matrix computations on today's memory tiered computer systems (see Elmroth, Gustavson, Jonsson and Kågström, SIAM Review, Vol. 46, No. 1, 2004, pp. 3-45). Recursion allows for efficient utilization of a memory hierarchy and generalizes existing fixed blocking by introducing automatic variable blocking that has the potential of matching every level of a deep memory hierarchy. Novel recursive blocked algorithms offer new ways to compute factorizations such as Cholesky and QR and to solve matrix equations. In fact, the whole gamut of existing dense linear algebra factorization is beginning to be re-examined in view of the recursive paradigm. Use of recursion has led to using new hybrid data structures and optimized superscalar kernels. The results we survey include new algorithms and library software implementation s for level 3 kernels, matrix factorizations, the solution of general systems of linear equations and several common Sylvester-type matrix equations. The software implementations we survey are robust and show impressive performance on today's high performance computing systems.

We end by discussing some open problems and ongoing work on using recursion for solving periodic matrix equations, leading to recursive blocked algorithms for 3-dimensional data structures (matrices). The third dimension is the periodicity index of the matrices.

The BlueGene/L Supercomputer and LOFAR/LOIS
Bruce Elmegreen, IBM Watson Research Centre

BlueGene/L is a new type of computer designed by IBM for extremely fast IO, internal communications, and floating point computations. A single rack contains 1024 node chips, each of which has 2 processors and 4 floating point units. The sustained computation speed of a rack is around 2.5 Tflops. A rack can also accept up to 128 1-Gbit ethernet IO connections, and it has a three-dimensional torus for internal communications between nodes at 2.8 Gbps in each direction. Many racks may be connected together to make a single large torus, or multiple torii each running their own job. The Netherlands Foundation for Research in Astronomy (ASTRON), which is headquartered in Dwingeloo, is planning to acquire 6 racks of BlueGene/L for use as a central processor in the Low Frequency Array Radio Telescope (LOFAR). The talk will discuss the characteristics of BlueGene/L, the operation and requirements of LOFAR and the LOFAR Outrigger in Scandinavia (LOIS), and the match between BlueGene/L and these new telescope systems.

[scalable algorithms for the solutions of large sparse linear systems] Jacko Koster, University of Bergen

SCI Socket: The fastest socket on earth and the impact on stoarage and applications
Atle Vesterkjær, Dolphin Interconnect Solution Inc.

The SCI SOCKET software provides a fast and transparent way for applications using Berkeley sockets - TCP/UDP/IP to use SCI as the transport medium. The major benefits are plug and play installation, high bandwidth and much lower latency than network technologies like Gigabit Ethernet, Infiniband and Myrinet. Real benchmarks shows that a complete 1-byte socket send - socket receive is completed in 2.26us (Full round-trip in 4.52us). SCI SOCKET has been tested using (cluster) file systems like PVFS and Lustre, NFS, iSCSI, applications like MySQL, Oracle and libraries like PVM, LAM and ScaMPI. It supports accumulated throughput using multiple adapters and transparent automatic failover recovery.

Any application using Ethernet can run on SCI SOCKET. No patching, modifications or recompilation is required, just install and run!

The presentation will give a short overview of the technology and the results from using SCI SOCKET to boost the performance for storage and applications.

Current Status of InfiniBand for HPC
Peter Kjellström, Linköping University

Infiniband is an interconnect that has received quite a lot of attention lately. The long term goals of Infiniband reaches far outside the HPC world. The number of devices and software projects that mentions Infiniband is quite impressive.

Being imersed in all this it's easy to forget that Infiniband is very young and that many of the impressive applications are in fact still on the drawing board.

My talk will focus on following two things:

  1. A short Introduction to this Infiniband thing.
  2. What works/exists today? Performance, hardware, software (HPC perspective)

Using Linux Clusters for Full-Scale Simulation of Cardiac Electrophysiology
Xing Cai, Simula Research Laboratory

The main theme of this presentation is an advanced parallel electro-cardiac simulator, which employs anisotropic and inhomogeneous conductivities in realistic three-dimensional geometries modeling both the heart and the torso. The Bidomain equations of electrophysiology constitute the main part of the mathematical model, which also involves a complicated system of ordinary differential equations. It will be shown that good overall parallel performance relies on at least two factors. First, the serial numerical strategy must find a parallel substitute that is scalable with respect to both convergence and work amount. Second, care must be taken to avoid unnecessary duplicated local computations while maintaining an acceptable level of load balancing. We report our experience of running parallel cardiac simulations on an Itanium Linux cluster, involving more than 150 million degrees of freedom.

Bringing Space On Line: High-Performance Computing for a Distributed Space Probing Sensor Network
Lars K. S. Daldorff, Uppsala University

Advances in information and communications technologies have revolutionised the way we exchange and use information and have helped increase dramatically our ability to understand the physical world. In the coming years, a second IT revolution is set to unfold--the connection of information systems directly to the environment.

The LOFAR (Low Frequency Array; www.lofar.org) infrastructure currently being built in the Netherlands will be a practical realisation of such a sensor web. The main objective of the subproject LOIS (LOFAR Outrigger in Scandinavia; www.lois-space.net) project is to supplement the receive-only LOFAR radio telescope with a powerful software radar capability in southern Sweden to enable active probing deeper into space than any existing space probing facility.

LOFAR/LOIS adopts a coherent holistic perspective on how to exploit high-speed network infrastructures, netted sensors, and high-performance computing for buliding sensor webs of continental dimensions.

The LOIS project currently focuses on the challenges associated with the generation, transport, management and processing of sensor data at extremely high rates (many Terabits/s) over large regions (many hundreds of kilometres) and on the simulation of the observational data for optimum design of the hardware and software for advanced space and Earth observations.

EVERGROW - probing the Internet of 2025
Erik Aurell, SICS and Royal Institute of Technology

Application Performance on High-End and Commodity-class Computers
Martyn F. Guest, CLRC Daresbury Laboratory

Commodity-based clusters now provide an established, viable cost effective alter native for the provision of High Performance Computing. In this presentation we compare the performance of a variety of clusters in the support of major researc h and production codes with current high-end hardware, such as the IBM p690+ series and the SGI Altix 3700, together with the older Compaq AlphaServer SC and SG I Origin 3800. Our focus lies in applications and looks to address the differing demands from the fields of Capability and Capacity computing. The results conce ntrate on the areas of computational chemistry, computational materials and comp utational engineering. Based on simple metrics, we consider the performance of a variety of codes, including NWChem and GAMESS-UK, CPMD, DLPOLY and CHARMM, plu s ANGUS and PCHAN, and in each case identify the associated bottlenecks.

We overview performance data from some twenty commodity-based systems (CS1-CS20), featuring Intel IA32 and IA64 plus AMD Athlon and Opteron architectures, coupled to traditional Beowulf interconnects, such as Myrinet and Gbit Ethernet, plus the SCALI/SCI, Infiniband and Quadrics QSNet interconnect technologies.

Title: Linux Performance Analysis Tools: Parallel, Serial and I/O Performance Characterization
Philip J. Mucci, Per Ekman, Royal Institute of Technology

This talk will introduce Open Source performance analysis tools on Linux clusters. These tools are meant to provide information for characterization and optimization of serial and parallel applications on large Linux systems. The talk will include numerous hardware performance analysis tools that use PAPI, the Performance Application Programming Interface. Recent work done at PDC/KTH on developing a performance monitoring infrastructure will also be covered. The latter half of the talk will introduce a new tool called IOTrack. The goal of this tool is to be able to efficiently and passively characterize the I/O performance of an application. Sample data from runs of a large application will be presented along with future direction for the tool.

MPI Microbenchmarks: Misleading and Dangerous
Greg Lindahl, Pathscale Inc.

HPC4U: Closing the gap between Resource Management and the Next Generation Grid
Matthias Hovestadt, University of Paderborn

The Next Generation Grid applications will demand Grid middleware for a flexible negotiation mechanism supporting various ways of Quality-of-Service (QoS) guarantees. In this context, a QoS guarantee may cover simultaneous allocations of various kinds of different resources requesting a certain level of Fault Tolerance, which are specified in the form of Service Level Agreements (SLA). Currently, a gap exists between the capabilities of Grid middleware and the underlying resource management systems concerning their support for QoS and SLA negotiation. Within this talk we will present an approach which closes this gap. The EU-funded project HPC4U will provide an SLA-aware and Grid-enabled Resource Management System which includes SLA negotiation and SLA-aware scheduling functionality, and provides Fault Tolerance by means of application-transparent checkpointing mechanisms.

TetSplat: Interactive Visualization of Huge Tetrahedral Meshes
Ken Museth, Linköping University

Museth will present a novel approach to interactive visualization and exploration of large unstructured tetrahedral meshes. These massive 3D meshes are used in mission-critical CFD and structural mechanics simulations, and typically sample multiple field values on several millions of unstructured grid points. Our method relies on the preprocessing of the tetrahedral mesh to partition it into non-convex boundaries and internal fragments that are subsequently encoded into compressed multi-resolution data representations. These compact hierarchical data structures are then adaptively rendered and probed in real-time on a commodity PC. Our point-based rendering algorithm, which is inspired by QSplat, employs a simple but highly efficient splatting technique that guarantees interactive frame-rates regardless of the size of the input mesh and the available rendering hardware. It furthermore allows for real-time probing of the volumetric data-set through constructive solid geometry operations as well as interactive editing of color transfer functions for an arbitrary number of field values. Thus, the presented visualization technique allows end-users for the first time to interactively render and explore very large unstructured tetrahedral meshes on relatively inexpensive hardware.

URL: http://www.gg.itn.liu.se

GRIA - Grid Resources for Industrial Applications
Steve Taylor, IT innovation, UK
Antonella Frigerio, CESI, IT

GRIA is an infrastructure that permits commercial use of the Grid. If you are an HPC provider, you may rent out your spare CPU cycles using the GRIA infrastructure. If you need HPC, you may outsource it using GRIA. GRIA comprises a client-side API and a server-side infrastructure based on Web Services technology so that clients and service providers may benefit from the Grid. A typical use of GRIA is as follows:

  1. A service provider connects an application code (for example a CPU-intensive Finite Element code) to GRIA.
  2. A client wishing to use that FE code contacts the service provider and opens a trade account.
  3. Once the trade account is approved the client may request an allocation of the service provider's computational resources for the FE code.
  4. The client can upload data and run the FE code at the service provider. When the job has finished, the client may download the results.
  5. Usage of the service provider's computational resources are recorded in the resource allocation, reducing the amount available to the client, and the client may store data and run jobs as long as they still have some of the allocation left.
  6. The cost of the resource allocation will be recorded in the client's trade account at the service provider.
  7. The client makes payments to the service provider by conventional means, and these are recorded in the trade account.

CESI are based in Italy, and provide structural simulation services. They would like to use GRIA to:

  • Make their structural simulation codes available on the Grid, and couple this with a consultancy-based service, whereby experts at CESI may assist a client-side user with their simulation. (Here CESI is a GRIA service provider.)
  • Run their simulation code at third party HPC providers at critical times to get a quicker turnaround of results for their clients. (Here CESI is a GRIA client.)

We describe the use of GRIA in this structural simulation sector, and how it can benefit both the client and the service provider.


NGN Abstracts

Activities and Perspectives of IHPC&IS
Vladimir Korkhov, Institute for High Performance Computing and Information Systems

IHPC&IS carries out fundamental and applied research requiring high-performance computing and complex mathematical modeling: plasma reactor simulations, marine decision support systems, climate changes forecast etc.

One of the institute departments is the Center for Supercomputing Applications which provides high performance computing resources to institute partners (SPbSU, PNPI etc.). CSA maintains a wide range of computing resources based on various architectures (parallel, vector-parallel, NUMA etc.).

IHPC&DB participates in a number of Grid-related projects since 2001. First deployment of Grid-testbed with PNPI (St.Petersburg Nuclear Physics Institute) in 2001, projects with University of Amsterdam (Virtual Laboratory on the Grid; Dynamite - dynamic load balancing for parallel applications; High Performance Simulation on the Grid in Russian-Dutch testbed funded by NWO+RFFI).


Niclas Andersson