PARALLEL DATA LAB 

PDL Abstract

Introduction and Purpose of the Scalable I/O Project

This white paper describes a collaborative project that brings together systems software developers, computer vendors, and applications teams to develop hardware and software systems to support scalable I/O for high performance computer systems. The project is organized around the provision of a full-scale testbed for the development and evaluation of new systems software for scalable I/O. In addition, research projects will be formed to address the scalable I/O problem from a number of perspectives, such as languages, compilers, file systems, networking software, persistent object stores, and low level system services.

Unlike the current state of parallel operating systems where a commonly available software platform exists (i.e., Mach), vendors that wish today to provide parallel I/O capabilities for their MPP systems must largely work from scratch when developing the file systems and user software needed to support a scalable I/O system. This problem forces many vendors to duplicate effort, and leads to the provision of conflicting solutions to the end users. This project intends to provide a commonly available set of software modules that can serve as the foundation for the next generation of parallel I/O systems.

To address this problem, we believe that systems software developers must work closely with both application developers and system providers to address the important issues. We also believe that it is critical for a full-scale I/O testbed to be available for development, full-scale debugging, and full-scale performance analysis and testing with real applications.

The I/O problem can be subdivided into a number of critical elements. First is the architecture of the I/O systems themselves. In this project we make few assumptions about the physical configuration of the system and I/O devices other than that they will be scalable with respect to the memory and CPU performance of the system. Next is the support for the devices themselves, provided by or in conjunction with the native operating system. We will develop as part of this effort parallel file systems that work with the native OS. Layered onto the parallel file systems are interfaces that provide access to the parallel I/O system, both for compilers and user code. A special component of this effort is the investigation of paradigms that can be embedded into High Performance Fortran (HPF) to support parallel I/O. Still higher-level functions will be provided by persistent object storage managers. Checkpoint and memory server capabilities may also be layered onto the parallel file systems or, in some cases, directly onto the storage devices. Since no parallel supercomputer is likely to be completely self contained, support for external file systems and remote devices via high-speed networks is also an important part of this effort. In particular, mass-storage systems (MSS) are likely to be connected via networks; accessing files that are stored in the MSS must not be a bottleneck. Applications projects will be sought out at each stage of development and testing to ensure the systems address real user needs. Finally, since the effort is organized around a testbed machine, full-scale experimentation can be used to refine the concepts and implementations.

Our goal is to develop and demonstrate an integrated set of tools and software systems that will provide to the user a scalable I/O facility. This software will be made available to the US high-performance computing community for productization and for future development.