Petascale Data Storage

Principal Investigator: Garth Gibson

"Addressing the challenges of petascale computing for scientific discovery on information storage capacity, performance, concurrency, reliability, availability, and manageability."

With the advent of new experimental facilities and more powerful supercomputers, researchers are now faced with the task of managing, sharing and analyzing petabytes of data. The Petascale Data Storage Institute brings together data storage and management expertise to meet the high performance storage requirements of today’s DOE terascale computational science, while simultaneously identifying, resolving and setting in motion solutions for the storage capacity, performance, concurrency, reliability, availability and manageability problems arising from petascale computing infrastructures for scientific discovery.

The effective development of solutions for the petascale systems of the next decade depends on the development of the human resources that will be needed to design, operate and manage these systems. This Institute will create classroom materials covering the scope of the Institute and deploy them in at least the graduate programs at all three of the Institute’s university members. Possible courses to be offered include advanced operating and distributed systems, advanced storage systems, security systems, and advanced scientific algorithms.

Petascale computing infrastructures for scientific discovery make enormous demands on information storage capacity, performance, concurrency, reliability, availability, and manageability. The last decade has shown that parallel file systems can barely keep pace with high performance computing along these dimensions; this poses a critical challenge when petascale requirements are considered. The Petascale Data Storage Institute will focus on the data storage problems found in petascale scientific computing environments, with special attention to community issues such as interoperability, community buy-in, and shared tools. Leveraging experience in applications and diverse file and storage systems expertise of its members, the Institute will enable a group of researchers to collaborate extensively on developing requirements, standards, algorithms, and development and performance tools. Mechanisms for petascale storage and results will be made available to the petascale computing community. The Institute will hold periodic workshops and develop educational materials on petascale data storage for science.

For more information, please see the institute's web page at

Institute Personnel

Participating Institutions and Co-Investigators:

Carnegie Mellon University - Garth Gibson (PI)
Lawrence Berkeley National Laboratory - John Shalf
Los Alamos National Laboratory - Gary Grider
Oak Ridge National Laboratory - Philip Roth
Pacific Northwest National Laboratory - Evan Felix
Sandia National Laboratories - Lee Ward
University of California at Santa Cruz - Darrell Long
University of Michigan at Ann Arbor - Peter Honeyman

Contact Us

Garth Gibson, PDSI PI
www |
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213

Angela Miller, Administrative Asst.
www |
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213
phone: 412-268-6645


This work is sponsored by the Department of Energy, under Award Number DE-FC02-06ER25767.