PDL Abstract

A Read/write Protocol Family for Versatile Storage Infrastructures

Carnegie Mellon University, Dept. ECE Ph.D Dissertation CMU-PDL-05-108, October 2005.

Jay J. Wylie

Parallel Data Laboratory
Electrical and Computer Engineering
Carnegie Mellon University
Pittsburgh, PA 15213

The ideal storage infrastructure scales to meet new demands. Traditionally, the emphasis has been on the capacity and performance scalability of a storage infrastructure. Current trends towards massive storage infrastructures comprised entirely of commodity components demand broader forms of scalability. The next generation of storage infrastructures must scale to tolerate more and varied types of faults. Fault-scalability, the ability to tolerate large numbers of faults efficiently, is needed so that the simultaneous failures of multiple commodity components can be tolerated. Versatility, the ability to store objects with radically different resiliency (fault-tolerance) and performance requirements simultaneously and efficiently, is needed so that a deployed storage infrastructure can meet new demands as they are identified.

This dissertation develops a set of related protocols for reading and writing data objects called the Read/Write Protocol Family (R/W-PF) that enables a versatile storage infrastructure to be built. The R/W-PF provides versatility: objects with different per-object resiliency requirements can be stored in the same storage infrastructure. The costs (response time, number of servers required, etc.) of storing an object are commensurate with its resiliency requirements. The R/W-PF incorporates versatile storage mechanisms, such as erasure codes, witnesses, and quorums, in its design, allowing the efficiency of read and write access to stored objects to be tuned to meet capacity and performance requirements.

Measurements of PASIS, a prototype storage system based on the R/WPF, demonstrate its versatility. These measurements show that the R/W-PF also provides fault-scalability. Measurements show the differing performance costs associated with various resiliency requirements and the workloaddependent merits of the storage mechanisms incorporated in the R/W-PF. The significant trade-offs associated with resiliency and storage mechanism choices underscore the importance of versatility in storage infrastructures

FULL THESIS: ps / pdf