PARALLEL DATA LAB 

PDL Abstract

Towards Accurate and Fast Evaluation of Multi-Stage Log-Structured Designs

In 14th USENIX Conference on File and Storage Technologies (FAST'16), Santa Clara, CA, February 2016..

Hyeontaek Lim, David G. Andersen, Michael Kaminsky*

Carnegie Mellon University
*Intel Labs

Multi-stage log-structured (MSLS) designs, such as LevelDB, RocksDB, HBase, and Cassandra, are a family of storage system designs that exploit the high sequential write speeds of hard disks and flash drives by using multiple append-only data structures. As a first step towards accurate and fast evaluation of MSLS, we propose new analytic primitives and MSLS design models that quickly give accurate performance estimates. Our model can almost perfectly estimate the cost of inserts in LevelDB, whereas the conventional worst-case analysis gives 1.8–3.5x higher estimates than the actual cost. A few minutes of offline analysis using our model can find optimized system parameters that decrease LevelDB’s insert cost by up to 9.4–26.2%; our analytic primitives and model also suggest changes to RocksDB that reduce its insert cost by up to 32.0%, without reducing query performance or requiring extra memory.

FULL PAPER: pdf