PARALLEL DATA LAB 

PDL Abstract

SlimDB: A Space-Efficient Key-Value Storage Engine For Semi-Sorted Data

Proceedings of the VLDB Endowment, Vol. 10, No. 13, 2017.

Kai Ren, Qing Zheng, Joy Arulraj, Garth A. Gibson

Carnegie Mellon University
Pittsburgh, PA 15213

http://www.pdl.cmu.edu/

Modern key-value stores often use write-optimized indexes and compact in-memory indexes to speed up read and write performance. One popular write-optimized index is the Logstructured merge-tree (LSM-tree) which provides indexed access to write-intensive data. It has been increasingly used as a storage backbone for many services, including file system metadata management, graph processing engines, and machine learning feature storage engines. Existing LSMtree implementations often exhibit high write amplifications caused by compaction, and lack optimizations to maximize read performance on solid-state disks. The goal of this paper is to explore techniques that leverage common workload characteristics shared by many systems using key-value stores to reduce the read/write amplification overhead typically associated with general-purpose LSM-tree implementations. Our experiments show that by applying these design techniques, our new implementation of a key-value store, SlimDB, can be two to three times faster, use less memory to cache metadata indices, and show lower tail latency in read operations compared to popular LSM-tree implementations such as LevelDB and RocksDB.

FULL PAPER: pdf