Swedish eInfrastructure for Research: The SNIC Landscape Document for 2010-2013

Sverker Holmgren, SNIC

The Swedish National Infrastructure for Computing (SNIC) has produced a new Landscape Document covering 2010-2013. The document describes Swedish eInfrastructure in the context of the current development in research as well as eInfrastructure service and hardware developments, with a focus on the services provided by SNIC. The document analyses the resources needed to provide Swedish researchers with a competitive eInfrastructure also in the future. The future development of SNIC is painted from the perspective of services provided to users and the infrastructure needed to provide these services is deduced.

MariCel: the PRACE prototype at BSC

Gabriele Carteni, Barcelona Supercomputing Centre, Barcelona, Spain

The high performance computing paradigm has changed after the end of the free ride for faster clock frequencies. Now, "increase concurrency and specialization" is the new approach for a sustainable supercomputing. This talk will start with an introductory overview of the Cell Broadband Engine Architecture (STI Alliance), a heterogeneous multi-core solution which leverages on a balance between computing power and a novel memory hierarchy model on a single chip. The core of the talk will be focused on "MariCel", a heterogeneous multi-core cluster based on the IBM PowerXCell8i processor. MariCel is the BSC's prototype belonging to the European PRACE project which prepares the creation of a persistent pan-European HPC service though the evaluation of the candidate technologies for the future petaflop/s computers. MariCel will be introduced with a special attention to the performance obtained during the benchmarking activity, the issues encountered during the pre-production phase and the software development environment provided to the users.

Ab-initio studies of advanced multifunctional materials

Biplab Sanyal, UPPMAX and Department of Physics and Materials Science, Uppsala University

Development of new materials to be used in advanced technologies needs a thorough understanding of properties in an atomic scale where quantum mechanical interactions play a crucial role. I will demonstrate a few important examples of computational materials research where ab-initio density functional theory has been employed to understand the physics of materials. The examples include magnetic semiconductors and graphene.

Mesoscopic simulations of many-body protein interactions

Mikael Lund, Kemicentrum, Lund University

The mutual interaction between protein molecules governs a range of biochemical processes, vital for the living cell. Inadvertent protein aggregation can have severe consequences as observed in amyloid diseases such as Alzheimer's and Parkinson's. However, protein association can also be utilized in non-detrimental technical applications and is hence subject to intense investigation within the field of food processing. Theoretical studies of protein- protein interactions typically involve merely two protein molecules and a significant amount of coarse graining is required in order to obtain converged, thermodynamic properties. Here we investigate many-body effects using Metropolis Monte Carlo simulations of up to forty proteins described in mesoscopic detail. The focus is on electrostatic interactions that can lead to significant orientational ordering of the bio-molecules [1] and result in salt and pH induced phase transitions.

The explicit part of the statistical mechanical averaging is done over all protein orientations and positions as well as volumes (NPT ensemble). This amounts to a large number of possible configurations and the calculations are therefore best carried out in parallel. Our free, open source Monte Carlo simulation software, Faunus, incorporates parallelization through a combination of OpenMP and the "embarrassing" approach.

[1] Persson B.A. and Lund M., Phys. Chem. Chem. Phys., 2009, 11:8879. Association and electrostatic steering of alphalactalbumin–lysozyme heterodimers.

Whole genome resequencing reveals loci under selection during chicken domestication

Carl-Johan Rubin, Department of Medical Biochemistry and Microbiology, Uppsala University

Introduction: During several thousand years humans have kept domestic animals whose phenotypic repertoires have been tailored to meet our needs by artificial selection. Thus, domestic animals constitute an important resource of the genetic dissection of phenotypic variation. The domestication of chicken from its wild ancestor was initiated ∼ 7.000 years ago and this process has yielded a huge variety of divergent chicken lines. The chicken genome is only one third the size of the human counterpart and is much less repetitive, which makes it an ideal genome for cost-efficient resequencing using newly developed sequencing methods.

Materials and methods: We used the SOLiD (Lifetech) technology to generate short sequence reads (35 bp) from ten separate chicken population pools. Reads were mapped to the reference genome at a depth of 4-5X coverage/pool, with 80-90% of the reference genome being covered by at least one read. The ten sampled lines comprised four layer lines that have been selected for egg laying, four broiler lines that have been selected for meat-production and finally two red junglefowl zoo-populations.

Objectives: A. Identify the most common allele in each line at each polymorphic site in the non-repetitive part of the genome. B. Identify loci that have undergone positive selection during chicken domestication or breed development (selective sweeps).

Results: We identified ∼ 7.5 million Single Nucleotide Polymorphisms (SNPs), more than twice as many as have previously been described in chicken. To detect selective sweeps we analysed combined SNP allele count data from populations selected for similar traits assessed and searched for stretches of reduced heterozygozity (H). This approach revealed a marked reduction in H for a positive control region where all sequenced lines of domestic chicken are known to share haplotype. Several other regions had equally low or lower H as the positive control region and these may represent selective sweeps that were important for the development of domestic chicken.

Constructing error-correcting codes with huge distances

Florian Hug, Lund University

The class of error-correcting convolutional codes is commonly used for reliable data transmission in mobile, satellite, and space-communication. Demanding simultaneously larger capacities and smaller error probabilities, convolutional codes with large free distances are needed. Such convolutional codes are in general characterized by large overall constraint lengths, increasing the complexity of determining the corresponding code properties, such as the free distance.

The BEAST – Bidirectional Effcient Algorithm for Searching Trees – will be presented as an alternative, less complex, approach to determine the free distance of convolutional codes. As an example a rate R = 5/20 hypergraph-based woven convolutional code with overall constraint length 67 and constituent convolutional codes is presented. Even though using BEAST, determining the free distance of such a convolutional code is a challenge. Using parallel processing and a common huge storage, it was possible to determine the this convolutional code has free distance 120, which is remarkably large.

The GIRD Grid Job Management Framework

P-O Ístberg, Department of Computing Science and HPC2N, Umeň University

The Grid Job Management Framework (GJMF) provides an architecture for middleware-independent Grid job and resource management. The GJMF is a composable Service-Oriented Architecture focused on flexibility and interoperability, and aims to decouple Grid applications from Grid middleware dependencies while supporting a range of possible deployment scenarios. The GJMF organizes framework capabilities in layers of services, where foundational layers provide middleware abstraction services and higher-level services aggregate and build on these to provide increasingly advanced job management capabilities.

The GJMF software prototype is implemented using Globus Toolkit 4 (GT4) and is based on current web and Grid service technologies and standards. The prototype is integrated with the GT4 and NorduGrid/ARC middlewares and provides customization points for additional middleware support. Access to job management interfaces is provided through Web Services and a Java API. The framework also defines a set of application support classes, and is planned to host a range of reference clients demonstrating use of the framework, including command line tools and graphical service monitoring clients.

The GJMF is developed as part of the GIRD multi-project, which researches fundamental theory, models, algorithms, and methods for Grid tools aimed towards solving complex large-scale scientific problems. Focus of the research in this project lies on development of design principles for reusable Grid software that facilitate application development and provide for application sustainability over versions and generations of Grid middlewares and technology. A production version of the GJMF is developed at HPC2N and supported by SNIC.

Performance Tuning for Anton, a Specialized Molecular Dynamics Machine

Ron Dror, D. E. Shaw Research, New York, USA

Anton, a massively parallel special-purpose supercomputer, accelerates molecular dynamics simulations of biological systems by orders of magnitude compared with the previous state of the art. While most of Anton's processing is performed by hardwired arithmetic pipelines, each ASIC also includes 13 programmable processor cores to sequence the hardwired pipelines and to execute certain portions of the molecular dynamics computation. In a large Anton system, these processors form a heterogeneous multiprocessor with thousands of cores.

Writing and tuning the software for this multiprocessor has been and continues to be a substantial undertaking. To optimize performance, we have exploited features of the problem domain, such as the timescales of various physical phenomena under simulation. This talk will describe the techniques used to tune Anton's software for high performance in this domain. Special attention will be given to the tools used, the manual tuning techniques that have proven successful, and the dimensions along which automatic tuning might operate.

High performance computing with GROMACS

Berk Hess, Stockholm University

The recently released version 4.0 of the GROMACS molecular simulation package has drastically improved the parallel scaling. With a few additional improvements GROMACS now scales to more than 150000 cores. Here I will discuss the algorithms that made this scaling possible, as well as several applications to large biomolecular systems.

Advanced profiling of GROMACS

Jesus Labarta, Barcelona Supercomputing Centre, Spain

The talk will describe studies of the behavior and scalability of GROMACS by using the CEPBA-tools environment, based on the trace visualizer Paraver and Dimemas, a simulator of message passing architectures. The analysis will look at the observed performance on The MareNostrum cluster, but also show the impact of architectural parameters of hypotetical target machines. We will show how simple abstract models can be derived to identify bottlenecks and to propose potential directions on how to increase its scalability. The analysis will look at the pure MPI version but also at hybrid MPI+OpenMP parallelizations.

NSC during 20 years

Karl-Fredrik Berggren, Department of Physics, Chemistry and Biology, Link÷ping University

On the occasion of the 20th anniversary the startup of supercomputing in Sweden and NSC will be outlined. From the view of high-performance computing we deal with ancient time. In the talk I will recall the early technological preconditions, science and financial policies, and some gossip. I will also outline the development of NSC into a national supercomputer centre for the needs of today and also speculate briefly about the future.

Perspective on aircraft design and HPC development over 20 years

Mattias SillÚn, Saab Aerosystems, Link÷ping

Over the last 20 years the aerospace industry has seen a dramatic development of the simulation capability, both in terms of physical modeling and geometry complexity. This is tightly related to the strong increase in computational capacity for high performance computers which has spurred the development of new physical models and improved numerical algorithms. Today, numerical simulations are essential in aircraft design for analysis and optimization. This talk will focus on changes in aircraft design methodology coupled to HPC and simulation tool development.

History of the SMHI and NSC partnership

Per UndÚn, SMHI

Numerical weather Prediction (NWP) on large scale computers started in the mid 50's in Sweden by scientists at Stockholm University and the Air Force. SMHI was soon involved in the activity. SMHI procured its own computers in the 60's and 70's but towards the mid 90's it became apparent that, even with the support from the Air force, owing ones own super-computer was no longer viable.

The cooperation with NSC was initiated in 1994 and this proved to be the most cost-effective way of providing the supercomputer performance that was needed for the operational meteorological as well as oceanographic models at time. The operational models need a very high peak performance in c4 time slots during the day whereas in between SMHI and other's research work can share the time.

SMHI was primarily interested in MPPs at this time and a lot of work on parallelisation had been done. Fortunately NSC provided both a shared memory vector system (CRAY C90) and the T3E MPP system. The operational model was set up on both since the former shared memory system was far more reliable for operational 7-24 schedule. The T3E was however valuable for providing a lot of research computing resources (and often operational work too).

There was quite a comprehensive project to set up the whole operational chain from SMHI to the NSC systems and back to SMHI for the results. It was complicated due to involving both SMHI servers , communication, two quite different architectures at NSC and, not the least, ongoing program development of the SMHI HIRLAM model used.

The regional climate modelling started at the same time at SMHI and a sizeable portion of the computing resources was used for the very long runs needed. An adapted version of the NWP model HIRLAM was used for this. The climate scenarios activities expanded so much that this activity needed their own resources far exceeding what was used for weather forecasting.

NSC has since then provided SMHI computing services on a succession of machines and also a lot of data storage capacity. A recent extension of this is the implementation and development of the ECMWF MARS archiving system at NSC. The partnership between SMHI and NSC continues to work very well and is the most cost effective solution for SMHI in international comparison, for the sort of computing budgets that are possible at SMHI.

SUNET during 20 years

Hans Wallberg, SUNET

SUNET — the Swedish University Computer Network — is the Swedish National Research and Education Network organization (NREN). SUNET has evolved from a X.25 based network for terminal access with a capacity of 300 bit/s to today's network, called OptoSunet, with a total capacity of 5 Tbit/s or more. Since 1988 SUNET has been part of Internet. From the beginning the Internet connection was a 56 kbit/s satellite link shared by the Nordic countries. Today the connections to the rest of the world, including Internet, are via a number of 10 Gbit/s and 40 Gbit/s fibre optical links.

HPC in Cardiovascular Medicine — the need for LES and FSI

Matts Karlsson, Department of Management and Engineering, Link÷ping University

Wall Shear Stress (WSS), the frictional load from the blood on the vessel wall, and its role in the genesis and progression of atherosclerosis is an important factor possibly affecting arterosclerosis and plaque forming by influencing the function of the endothelial cells. Current research indicates that low and/or oscillating WSS is important for the development of disease; however, high shear stress may stimulate thrombosis at the site of the injury that may eventually occlude the entire vessel causing severe ischemic disease or infarction.

We seek to create patient specific models of the human cardiovascular system, in particular arteries, in order to understand age related changes in structure and function as well as early detection of cardiovascular diseases such as atherosclerosis and dilatation. We utilize the basic principles of applied mechanics as well as the modelling and simulation capabilities from computational engineering and high performance computing in combination with modern imaging modalities and image processing.

Shape optimization and active flow control for improved aerodynamic properties

Sinisa Krajnovic, Division of Fluid Dynamics, Chalmers

The talk will present research on development of HPC process for design and improvement of aerodynamic properties of vehicles. The optimization of the aerodynamics of road and rail vehicles has traditionally been handled through trial and error design procedures, which count on the skills and experience of the designer to suggest changes in the design that are likely to yield improvements. Although such a procedure usually yield an acceptable design use of more rigorous optimization methodology would allow the best design to be identified. Research group at Chalmers has developed an automatic shape optimization process where simulations are run on a computer cluster such as Neolith at NSC. The optimization is made automatic by connecting the flow solver with the optimization code and virtual geometry deformation tool in an optimization loop. Using this new process we were able not only to improve aerodynamic performance of cars and trains but to find their best possible aerodynamic design. For the first time ever the original design of the vehicle can be fine-tuned without human interaction and preserving all the esthetical properties of the design. With this new design process the computer cluster has not only replaced the costly wind tunnel but it ads an extra value as it provides the optimal design.

Quantum Aspects of Surface Plasmons in Reduced Dimensions — insights from computational studies

Shiwu Gao, Department of Physics, University of Gothenburg

In this talk, I will present some of our recent progress in the computational study of surface plasmons in low-dimensional nanostructures, whose sizes and shapes are tunable down to atomic precision. Our studies involve both massive code development based on linear response theory and the time-dependent local density approximation (LR-TDLDA), and extensive applications to model systems and nanomaterials including metallic thin films, linear atomic chains, nanoparticles and dimers, and pristine and doped graphene. We elaborate the development of collectivity, the effect of spatial quantization, and the anisotropy of dynamic screening of d-electrons. The energy dispersion and Landau damping of surface plasmons are determined quantum-mechanically, and are compared with classical electrodynamical models.

[1] S. Gao and Z. Yuan, Phys. Rev. B72, 121406 (2005).
[2] J. Yan, Z. Yuan, and Shiwu Gao, Phys. Rev. Lett. 98, 216602 (2007).
[3] Z. Yuan and Shiwu. Gao, Phys. Rev. B73, 155411 (2006).
[4] J. Yan and Shiwu Gao, Phys. Rev. B78, 235413 (2008).
[5] Z. Yuan and Shiwu Gao, Surface Science, 602, 460 (2008).
[6] Z. Yuan and Shiwu Gao, Comput. Phys. Commun. 180, 466 (2009).

Theory of simple and complex materials

Sergei Simak, Department of Physics, Chemistry and Biology, Link÷ping University

Our use of supercomputers for conducting theoretical projects of different complexity will be reviewed. Though the considered examples are relevant to different fields of science, from geophysics to fuel-cell technologies, they are all based on Quantum Mechanics and Density Functional Theory and require large-scale calculations. The success and computer-related problems will be discussed.

Computational Combinatorics and Experimental Mathematics

Klas Markstr÷m, Department of Mathematics and Mathematical Statistics, Umeň University

The advent of large scale supercomputers has made it possible to approach mathematical problems in a new way. On one hand computers can be used to find and construct various objects, such as codes and designs. On the other hand they can also be used in and "experimental" way where different properties of a mathematical object are computed exactly and then used to either formulate conjectures or in proofs. Especially in combinatorics these computations can easily dwarf any other computational task in terms of both memory and CPU time. I will survey some of my own work in these different directions.

Laser-matter interactions and particle acceleration

Mattias Marklund, Department of Physics, Umeň University

The interaction between high-intensity lasers and matter is currently seen as a means for producing more compact high-energy particle sources. I will give an overview of our activity in this field.

Numerical Simulation of Turbulent Boundary-Layer Flows

Philipp Schlatter, Mechanics, KTH, Stockholm

In this talk, the latest results obtained mainly by large parallel computations on the new Ekman cluster are presented. Direct and large-eddy simulations (DNS and LES) of spatially developing high-Reynolds number turbulent boundary layers (Reθ up to 4300) under zero pressure gradient are studied. Such flows, at least approximately, appear in a variety of technical as well as environmental applications, such as the flow around airplane wings or on the earth surface. Our simulation is at the present one of the largest that has been performed so far for that setup; the data is obtained with up to O(1010) grid points using a parallelised, fully spectral method. The DNS and LES results are critically evaluated and validated, in comparison to other simulations and experimental data. Quantities difficult or even impossible to measure, e.g. pressure fluctuations and complete Reynolds stress budgets, are evaluated and discussed. In addition, special emphasis is put on a further quantification of the large-scale structures observed in the flow, and their relation to other wall-bounded flows as e.g. channel flow. The results clearly highlight that with today's computer power and the usage of efficient, parallel algorithms Reynolds numbers relevant for industrial applications can be within reach for time-resolved simulations.