PARALLEL DATA LAB 

PDL Abstract

Backward Error Recovery in Redundant Disk Arrays

Appears in Proceedings of the 1994 Computer Measurement Group (CMG) Conference, Orlando FL, Vol. 1, December 4-9, 1994, pp. 63-74. Supercedes Carnegie Mellon University SCS Technical Report CMU-CS-94-193.

William V. Courtright II and Garth A. Gibson

School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213

http://www.pdl.cmu.edu/

Redundant disk arrays are single fault tolerant, incorporating a layer of error handling not found in nonredundant disk systems. Recovery from these errors is complex, due in part to the large number of erroneous states the system may reach. The established approach to error recovery in disk systems is to transition directly from an erroneous state to completion. This technique, known as forward error recovery, relies upon the context in which an error occurs to determine the steps required to reach completion, which implies forward error recovery is design specific. Forward error recovery requires the enumeration of all erroneous states the system may reach and the construction of a forward path from each erroneous state. We propose a method of error recovery which does not rely upon the enumeration of erroneous states or the context in which errors occur. When an error is encountered, we advocate mechanized recovery to an error-free state from which an operation may be retried. Using a form of backward error recovery, we are able to manage the complexity of error recovery in redundant disk arrays without sacrificing performance.

FULL PAPER: pdf / postscript
ORIGINAL TR VERSION OF THIS PAPER: pdf / postscript