PARALLEL DATA LAB 

PDL Abstract

LazyBase: Trading Freshness for Performance in a Scalable Database

EuroSys 2012, April 10-13, 2012, Bern, Switzerland.

James Cipar, Gregory R. Ganger, Kimberly Keeton*, Charles B. Morrey III*, Craig A. N. Soules*, Alistair Veitch*

Dept. of Electrical and Computer Engineering
Carnegie Mellon University
Pittsburgh, PA 15213

* HP Labs

http://www.pdl.cmu.edu/

The LazyBase scalable database system is specialized for the growing class of data analysis applications that extract knowledge from large, rapidly changing data sets. It provides the scalability of popular NoSQL systems without the query-time complexity associated with their eventual consistency models, offering a clear consistency model and explicit per-query control over the trade-off between latency and result freshness. With an architecture designed around batching and pipelining of updates, LazyBase simultaneously ingests atomic batches of updates at a very high throughput and offers quick read queries to a stale-but-consistent version of the data. Although slightly stale results are sufficient for many analysis queries, fully up-to-date results can be obtained when necessary by also scanning updates still in the pipeline. Compared to the Cassandra NoSQL system, Lazy- Base provides 4X–5X faster update throughput and 4X faster read query throughput for range queries while remaining competitive for point queries. We demonstrate LazyBase's tradeoff between query latency and result freshness as well as the benefits of its consistency model. We also demonstrate specific cases where Cassandra's consistency model is weaker than LazyBase's.

FULL PAPER: pdf