October 17-18, 2006
National Supercomputer Centre (NSC)
Linköping University, Sweden
Programme for the 7th Annual Workshop on Linux Cluster for Super Computing
TutorialsTwo tutorials are offered this year:
T2 - The Gaussian 03 tutorial offers a unique opportunity for the quantum chemical community in Sweden to get an in-depth presentation of the single-most used quantum chemistry code of today.
T3 - The Efficient parallel applications on NUMA machines tutorial provides a presentation of the Altix supercomputer with special focus on Sweden largest shared memory resource Mozart at NSC.
The Using NSC systems tutorial (T1) is cancelled.
For detailed information on the tutorials, please click on the links above.
WorkshopThe workshop seminars will be given in auditorium Visionen in the B-house, entrance 27, with wireless internet connection. [map]
Multi-core Beowulf Clusters for Petaflops-scale ComputingProfessor Thomas Sterling
Department of Computer Science
Center of Computation and Technology
Louisiana State University
The continuing opportunity presented by semiconductor technology trends characterized as Moore's Law also imposes significant challenges to processor, system, and software designers. The point of diminishing returns has been reached in the exploitation of logic complexity in processor design and this combined with severe problems related to power consumption has forced the industry to the strategy of integrated multiple processor cores on a single die. This "multi-core" methodology is a dramatic change from prior conventional practices and demands for the first time that the mainstream hardware and software community embrace parallel processing. Commodity Linux clusters such as low cost Beowulf class systems must necessarily integrate multi-core components both as homogeneous and heterogeneous structures. It can be anticipated that most commercial processing chips will incorporate eight cores before the end of the decade and with additional heterogeneous structures integrating accelerators, this degree of parallelism is already being achieved. Large commodity clusters in the hundreds of Teraflops range are already being planned and Petaflops-scale clusters will be deployed in the early part of the next decade if not before. While promising to continue the growth of peak performance, multi-core components will aggravate the challenge of achieving effective sustained performance for all but the most trivial of distributed application algorithms. Cache fragmentation, the memory wall, and system-wide latency will all exacerbate cluster operation. ParalleX is an innovative model of computation being developed to address these challenges and take advantage of future multi-core clusters as they scale to Petaflops capability. ParalleX replaces the dominant communicating sequential processes model (e.g. MPI) with one based on message-driven split-phase transaction multi-threaded processing. It eliminates the use of global barriers with local futures-based synchronization. ParalleX computing exhibits intrinsic latency hiding while exposing abundant parallelism, especially for sparse data structures with embedded meta-data such as directed graphs. This talk will discuss the current trends towards Petascale multi-core Beowulf clusters and present the innovative ParalleX model of computation that may provide a framework for addressing the many challenges implied by these new system architectures.
Linux on different architectures, a comparison of Cell, Intel, Opteron, and Blue GeneNils Smeds
IBM Deep Computing
IBM has showed a long lasting dedication to Linux over the years. Today, Linux is a supported alternative on all IBM platforms from laptops and workstations to supercomputers such as BlueGene and "RoadRunner". The newest released platform is the Cell Broadband Engine - a CPU co-developed with Sony and Toshiba. This talk will focus on the Cell BE, its architecture, application areas and how you can get started programming it.
Multicore from an Application's PerspectiveProfessor Erik Hagersten
Uppsala University, Sweden
The multicore revolution is driven by the hardware vendors in a desperate attempt to continue their quest for performance. While this strategy has proven successful in some application areas, the performance footprint offered by multicore technology is disruptive in many areas, such cache size per thread, intra-thread communication cost, memory bandwidth per thread and the choice of either capacity or capability computing for increased performance. In this talk we will identify these disruptive technology areas and their corresponding application challenges as well as discuss some possible ways of overcoming these problems. We show how these fairly straight-forward techniques can be used to speed-up the performance of Gauss-Seidel by a factor three.
A Shared Memory Programming Technique Applied to Computational Chemistry ProgramsDr Roberto Gomperts
Silicon Graphics Inc.
A parallelization technique for shared memory (OpenMP) that we have used to parallelize sequential programs will be presented. This technique has been applied successfully in the past to various computational chemistry programs like Gaussian (Quantum Chemistry) and Amber and Charmm (Molecular Dynamics).
Using FPGAs in Supercomputing - Reconfigurable SupercomputingStefan Möhl
Co-founder, VP, CTO, Mitrionics AB, Sweden
Using FPGAs to accelerate computations has long been a topic for researchers seeking ways to push the limits of supercomputing. Today FPGA Supercomputing generates more interest than ever, with major system vendors offering FPGA-equipped computers off the shelf, thus making the technology so much more accessible to a wider audience. But putting FPGAs in supercomputers radically changes the playing field. Traditional hardware design methods are no longer applicable for several reasons. This talk is a presentation of how Mitrionics makes it possible to run software in FPGAs by putting a processor in the FPGA, allowing the user to program the processor instead of designing an electronic circuit to place in the FPGA.
From gaming to serious supercomputing funJuan Jose Porta
IBM Boeblingen Labs, Germany
The Cell Broadband Engine (CBE, a.k.a. Cell microprocessor) has been jointly developed by IBM, Sony and Toshiba. The Cell Broadband Architecture is intended to be scalable through microprocessor integration of parallel vector processing, where a general-purpose Power processor core is interconnected with eight special-purpose streaming SIMD cores ("Synergistic Processing Elements" or SPE). The first major commercial application of Cell is in Sony's upcoming PlayStation 3 game console. Cell was developed to be a general purpose processor and also a multimedia processor.
Exploiting the CBE as a standard Power Architecture processor with tightly attached (on-chip) SIMD acceleration capabilities opens up a non-disruptive migration path to significantly improve the price/performance competitiveness (Teraflop per €, Watts and/or floorspace) of Power blades and clusters. The accelerator-based programming model has the additional attractiveness to leverage the Linux on Power and VMX ecosystems while opening up opportunities to push selected applications into the supercomputing realm.
Shifting computational work from the main PPE processors to SPEs (by running special subroutines that take over portions of the application) is a straighforward extension of current VMX tuning techniques, enabling users to solve complex problems in less time without having to deal with the complexity of reconfigurable computing techniques like FPGAs. HPC segments that can potentially benefit from the CBE capabilities include astrophysics, bioinformatics, computational chemistry, medical imaging, seismic processing, climate modelling, computational fluid dynamics, financial engineering, quantum chromodynamics and other compute-intensive areas.
This presentation will discuss the concepts of the new Cell Broadband Engine and its impact on software development and performance. Operational environment and programming models concepts, as well as first performance results for different benchmarks and applications will be presented.
NDGF - a Joint Nordic Grid FacilityLars Fischer
NDGF, Copenhagen, Denmark
NDGF - the Nordic Data Grid Facility - is a production grid facility that leverages existing, computational resources and grid infrastructures in the Nordic countries. NDGF was established in the spring of 2006. The mission of NDGF is to ensure that researchers in the Nordic countries can create and participate in computational challenges and e-Science projects of scope and size unreachable for the national research groups alone. NDGF facilitates use of computational and data storage resources, undertakes middleware development and application integration, and does project management. The talk will elaborate on the motivation and organization of NDGF, give a status for NDGF, and provide some early use cases.
A new 7 TFLOP cluster in NOTURDr Cyril Banino-Rokkones
Dept. of Computer & Information Science (IDI)
NTNU Trondheim, Norway
From the beginning of November, a new 7.5 TFLOPS cluster will be available in NOTUR. The cluster is based on IBM Power5+ architecture with IBM Federation technology as interconnect. The new computing resource will almost triple NOTUR's total computing power. It will be used for operational weather forecast, CFD, construction mechanics, and physics calculations. The first part of the talk is a presentation of the system, while the second part reports promising uniprocessor experiments performed on a provisory test-bed installation.
HPC facilities at CSC, the Finnish IT Center for ScienceJuha Fagerholm
CSC, Helsinki, Finland
CSC has recently decided on new facilities for HPC resources in Finland. The aim was to get both high-end capacity and price-performance capacity. The selection criteria were overall performance, system functionality, total costs, and cooperation aspects. The systems will be installed in stages during 2006-2008 and they should be efficient enough to serve the Finnish computational science community for the next few years. Performance was evaluated using several application benchmarks carefully selected to cover different fields of computational science that are strong in Finland. Most of the resources available will be allocated to research groups based on their research proposals. The Nordic Grand Challenge Survey will be used when choosing grand challenge projects for the supercomputer. The Nordic computational Grand Challenge Survey has been initiated to collect the ongoing large-scale computing projects or needs in the next years in the Nordic countries.
Experiences from Lustre deployment at NSCPeter Kjellström
National Supercomputer Centre, Sweden
Lustre is an open source parallel file system developed by Cluster Filsystems (CFS). Lustre was designed for use in HPC environments and is deployed on several of the largest Machines in the world. This talk will contain both an introduction to Lustre (general information about the file system) and a description of what we at NSC have done (configurations, goals, what the users thought, what broke and what we will do next...).
dCache, the Peta-Scale Storage ElementPatrick Fuhrmann
With the increasing size of modern experiments, like the upcoming LCG accelerator at CERN, and the political demand of governments to spend money locally as well as to provide computing infrastructure to their national laboratories, the requirements of data storage systems have significantly increased. Beside the traditional ability to store data on disk or tape devices, Storage Systems presently have to provide interfaces to wide area transfer protocols as well as to protocols shaping the bandwidth of transfers and managing local storage. Moreover they need to offer a set of POSIX like access protocols in order to allow analysis jobs on their attached computing farms to get random, low latency access to datasets. High demanding analysis might even make it necessary to replicate often used datasets to improve overall throughput. In addition, depending on the storage strategy of the particular experiment, some laboratories need to persistently store data in tertiary storage which requires smart scheduling of data streams between disk and tape storage. dCache, one of the official LCG Storage Elements, which has proven to cope with these peta-scale storage requirements will be introduced in this presentation.
The Swedish HPC landscapeSverker Holmgren
Uppsala University, Sweden
The Swedish metacentre for HPC, SNIC (Swedish National Infrastructure for Computing), has six member centres located to provide strong national HPC services. SNIC has recently produced a landscape document that puts the Swedish HPC infrastructure in the context of the current trends in computational sciences as well as service and hardware developments. This document is presented, including an analysis of what type of HPC resources that are needed to provide Swedish researchers with a competitive HPC infrastructure. Road maps outlining how SNIC will implement the landscape are also described, giving blueprints for the future of the Swedish HPC landscape for the next three years.