NSC'08 logo

Track NSC'08 - Abstracts

Problems with Regulations and Laws for Data Storage

Magnus Stenbeck, DISC director

The Database Infrastructure Committee (DISC) of the Swedish Research Council (VR) purports to create a common data infrastructure for research using modern technology in order to facilitate access and increase security.

The use of data for scientific analysis is subject to regulations in at least in two major areas. One is the protection of individual integrity and the associated judgments of research ethics. Rules about privacy are usually parts of the basic national constitutions. Such privacy rules are present in most countries and in federations such as the USA and the EU. In Sweden the law of secrecy regulates to which extent personal data can be used. In addition, a law of ethics trials is implemented through a hierarchy of ethics boards, which must approve the research before it can get access to data.

A second set of rules concerns the ownership issues, i.e. the copyright issues. The Swedish Research Council subscribes to international principles of Open Access to research results. However, the rules not only apply to research results, but also to the basic collected data and to intermediate forms of this, such as well documented, cleaned etc versions of the data. Principles of sharing research data in computer fies have been discussed at least during the past 50 years, and worldwide associations exist for the exchange of data, for instance the Center for Exchange of Social Science Data Association (CESSDA) in Europe and the Inter-University Consortium for Political and Social research (ICPSR) in the US. Copyright laws, patenting, etc. have to be married to the principles of open access, and access to data controlled in a system that improves access while protecting commercial and other ownership rights.

The creation of a common infrastructure for research within social science and medicine and other disciplines that use person data is at best very difficult, at worst illegal in Sweden today. International sharing is even more difficult. DISC is delaing with these issues using a broad approach that includes development of technologiy as well as legal systems, and involves both national and international collaborative efforts.

Scalable Performance in the Panasas Parallel File System

Dr. Brent Welch, Panasas

The Panasas parallel file system is used with some of the largest super computing clusters in the world, including the LANL RoadRunner system that is currently #1 on the Top-500. It supports commercial data processing for seismic data processing, EDA, weather simulation, semiconductor manufacturing, and financial risk analysis. This talk describes the parallel architecture of the system and gives some performance results that demonstrate its scalability.

Building Self-Healing Mass Storage Arrays for Large Cluster Systems

Toine Beckers, DataDirect Networks Inc.

With the growing needs for High Performance Computing clusters (from GFlops to TFlops and even PFlops systems) in many application fields, the needs for more and more data storage capacity also increases as well. This often leads to complex, difficult to manage storage solutions. With the Silicon Storage Appliance products from DataDirect Networks an easy to manage, scalable and high performance solution is provided which is becoming widely accepted in the High Performance Computing Community.

GPFS Performance Tuning

Klaus Gottschalk, IBM

GPFS is the proven scalable parallel file system from IBM for Linux, Windows and AIX. Configuring and using GPFS is simple and straight forward, but GPFS performance and availability can depend on using the right settings. This talk will describe the GPFS concepts and give recommendations for GPFS sizing, tuning and best practices.

Scaling Storage Up and Out

Staffan Strand, Hitachi

Different segments of HPC put extreme demands on parallel file system performance and storage capacity respectively. Staffan Strand from Hitachi gives an introduction to Hitachi's hardware based architectures for addressing this and an insight into some of the development that can be expected on the storage side.

Long Distance InfiniBand Transport

Dr. David Southwell, Obsidian

InfiniBand is a favoured interconnect for clusters and high-performance storage, but is generally limited to intra-building connections. This presentation describes system-architecture opportunities enabled by InfiniBand range-extension devices, capable of campus, metro or even global reach communications over standard optical infrastructure.

TSM Large Backup System Performance

Frank Müller, IBM

This presentation will cover:

  • How closely TSM & GPFS are integrated.
  • Techniques for a full backup in a large scale file system environment.
  • How a memory efficient TSM backup could look like.
  • The integration of TSM HSM in a large scale file system environment.
  • Outlook to new features in v6.1 and later releases.

NorStore, Storing Norwegian Research Data

Jan Meijer, UNINETT

NorStore is the Norwegian project to establish and operate a national data infrastructure that provides non-trivial services to scientific disciplines with a variety of needs for storing digital data. The talk will be about the vision we work to realize, the current state of affairs and some of the issues we have so far encountered.

Lustre and Other Open Source Based Storage Projects

Dr. Torben Kling-Petersen, Sun Microsystems

The Lustre parallel filesystem is the leading HPC storage solution on the market today with 7 of the top 10 supercomputers in the world (Top500) to its credit. Lustre is designed for data-intensive applications, and delivers dramatically increased throughput and I/O through intelligent serialization and separation of metadata operations from data manipulation.

This presentation will outline some of the features of the open source based Lustre file system, the architecture and functional components and the near-time roadmap. In addition, Sun Microsystems efforts to create a new type of high performance storage servers, open source software and simplified administration under the umbrella of Open Storage will be discussed.