PDL Abstract

Fast, On-Line Failure Recovery in Redundant Disk Arrays

Appears in Proc. of the 23rd Annual International Symposium on Fault-Tolerant Computing, pp. 421-433, 1993.

Mark Holland, Garth A. Gibson, and Daniel P. Siewiorek

School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213


This paper describes and evaluates two algorithms for performing on-line failure recovery (data reconstruction) in redundant disk arrays. It presents an implementation of disk-oriented reconstruction, a data recovery algorithm that allows the reconstruction process to absorb essentially all the disk bandwidth not consumed by the user processes, and then compares this algorithm to a previously proposed parallel stripe-oriented approach. The disk-oriented approach yields better overall failure-recovery performance.

The paper evaluates performance via detailed simulation on two different disk array architectures: the RAID level 5 organization, and the declustered parity organization. The benefits of the disk-oriented algorithm can be achieved using controller or host buffer memory no larger than the size of three disk tracks per disk in the array. This paper also investigates the tradeoffs involved in selecting the size of the disk accesses used by the failure recovery process.

FULL PAPER: pdf / postscript