Egalitarian Distributed ConsensusCarnegie Mellon University Ph.D. Dissertation CMU-CS-14-133. August 2014.
Carnegie Mellon University
This thesis describes the design and implementation of state machine replication (SMR) that achieves near-perfect load balancing and availablility, near-optimal request processing latency (especially in the wide area), and performance robustness when confronted with failures and slow replicas.
Traditionally, practical replicated state machines have used leader-based implementations of consensus algorithms, because it has been believed that they provide the best performance -- highest throughput and lowest latency. At the same time, however, a leader-based approach has many drawbacks: the failure of the leader halts the entire replicated state machine temporarily, the speed of the entire set is determined by the speed of the leader, and, in geo-replicated scenarios, the distance to the leader causes remote clients ot experience high latency.
This work shows that leaderless approaches can not only solve these problems and provide the flexibility of a completely decentralized system, but they can also achieve substantially higher performance than leader-based protocols. We introduce a new variant of the Paxos protocol that we call Egalitarian Paxos. In Egalitarian Paxos all replicas perform the same functions simutaneously to ensure better load balancing and availability, lower commit latency and higher performance robustness when compared to previous Paxos variants. The benefits of Egalitarian Paxos are most apparent in the wide area, where its latency is optimal in may practical scenarios. We show -- both theoretically and empirically -- that Egalitarian Paxos has the aforementioned benfits when updating the state of a replicated state machine. We then apply the same leaderless design principle to improve the SMR read performance: quorum read leases generalize previously proposed time lease-based approaches to allow arbitrary sets of replicas to perform strongly consistent local reads for parts of the replicated state.
FULL TR: pdf