Parallel Data Laboratory

Segment-Structured Key-Value Cache Management System

Key-value caches such as Memcached and Cachelib are widely used in today’s data centers to speed up data access, avoid repeated computation, and reduce bandwidth usage. Web applications such as Twitter, Netflix, Pinterest, and Meta, use PBs of DRAM for caching. Therefore, a small improvement in efficiency leads to a significant reduction in resource usage. Reducing resource consumption not only lowers the cost but also makes make caching services more sustainable.

Different from CPU and page cache, the efficiency of key-value caches is not only determined by the eviction algorithm but also by several other factors. This project introduces the concept of the Key-Value Cache Management System, which consists of two components — cache replacement and space management. Cache replacement includes admission, expiration, and eviction, all of which decide the replacement effectiveness. Space management includes indexing, layout, and object metadata, which decide the space utilization. An efficient key-value cache needs to be effective at cache replacement, meanwhile having a high space utilization. Log/Segment-structured cache provides one way to achieve both, and this project looks into the challenges and opportunities of designing log/segment-structured key-value caches.

People

FACULTY

Rashmi Vinayak

GRADUATE STUDENT

Juncheng Yang

INDUSTRY COLLABORATOR

Yao Yue (ex-Twitter)

Publications

GL-Cache: Group-level Learning for Efficient and High-performance Caching. Juncheng Yang, Ziming Mao, Yao Yue, K. V. Rashmi. 21st USENIX Conference on File and Storage Technologies (FAST '23). Feb. 21–23, 2023, Santa Clara, CA.
Abstract / PDF [1.84M]
C2DN: How to Harness Erasure Codes at the Edge for Efficient Content Delivery. Juncheng Yang, Anirudh Sabnis, Daniel S. Berger, K. V. Rashmi, Ramesh K. Sitaraman. 19th USENIX Symposium on Networked Systems Design and Implementation. April 4–6, 2022 • Renton, WA, USA.
Abstract / PDF [1.9M] / Slides / Talk Video
Segcache: A Memory-efficient and Scalable In-memory Key-value Cache for Small Objects. Juncheng Yang, Yao Yue, K. V. Rashmi. 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI). Virtual Event, April 12–14, 2021. NSDI'21 Community Award and NSDI'21 BEST PAPER AWARD!
Abstract / PDF [517K] / Slides / Talk Video
A Large Scale Analysis of Hundreds of In-memory Cache Clusters at Twitter. Juncheng Yang, Yao Yue, K. V. Rashmi. 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI'20), Virtual Event, Nov. 4–6, 2020.
Abstract / PDF [1.6M] / Slides / Talk Video

CODE

In-memory key-value cache traces:
https://ftp.pdl.cmu.edu/pub/datasets/twemcacheWorkload/open_source/
Segcache:
https://github.com/Thesys-lab/Segcache
Pelikan Segcache:
https://github.com/pelikan-io/pelikan (production system)
GL-Cache:
https://github.com/Thesys-lab/fast23-GLCache

Acknowledgements

We thank Twitter for allowing us to investigate the performance of production caches, and open-source the large cache trace dataset. The project is supported by a Facebook Fellowship, NSF CNS 1901410 and 1956271. The computation was performed on the PDL clusters, the Cloudlab testbed, the Chameleon testbed, and AWS.

We thank the members and companies of the PDL Consortium: Amazon, Google, Hitachi Ltd., Honda, Intel Corporation, IBM, Meta, Microsoft Research, Oracle Corporation, Pure Storage, Salesforce, Samsung Semiconductor Inc., Two Sigma, and Western Digital for their interest, insights, feedback, and support.