PDL Abstract

Efficient Byzantine Fault Tolerance for Scalable Storage and Services

Carnegie Mellon School of Computer Science Ph.D. Dissertation CMU-CS-09-146. July 2009.

James Hendricks

School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213

Distributed systems experience and should tolerate faults beyond simple component crashes as such systems grow in size and importance. Unfortunately, tolerating arbitrary faults, also known as Byzantine faults, poses several challenges to system designers, often limiting performance, requiring additional hardware, or both. This dissertation presents new protocols that provide substantially better performance than previously demonstrated. The Byzantine fault-tolerant erasure-coded block storage protocol proposed in this thesis provides 40% higher write throughput than the best prior approach. The Byzantine fault-tolerant replicated state machine provides a factor of 2.2-2.9 times higher throughput than the best prior approach. Furthermore, the protocols presented in this dissertation require 25-33% fewer responsive servers than the nearest competitors. To enable these results, this dissertation introduces several new techniques, including homomorphic fingerprinting, partial encoding, and Byzantine Locking, that provide unprecedented scalability, higher throughput, lower latency, and lower computational overhead. This dissertation also considers new methods for analyzing the correctness of distributed systems in the presence of faulty clients. Distributed services and storage systems built using these techniques can provide Byzantine fault tolerance in a more efficient, higher performance, and more scalable manner than previously thought possible.

KEYWORDS: Distributed Systems, Distributed Storage Systems, Fault Tolerance, Reliability, Security, Byzantine Fault Tolerance, Byzantine Fault-Tolerant Storage, Homomorphic Fingerprinting, Byzantinc Locking, Partial Encoding