Key-value caches such as Memcached and Cachelib are widely used in today’s data centers to speed up data access, avoid repeated computation, and reduce bandwidth usage. Web applications such as Twitter, Netflix, Pinterest, and Meta, use PBs of DRAM for caching. Therefore, a small improvement in efficiency leads to a significant reduction in resource usage. Reducing resource consumption not only lowers the cost but also makes make caching services more sustainable.
Different from CPU and page cache, the efficiency of key-value caches is not only determined by the eviction algorithm but also by several other factors. This project introduces the concept of the Key-Value Cache Management System, which consists of two components — cache replacement and space management. Cache replacement includes admission, expiration, and eviction, all of which decide the replacement effectiveness. Space management includes indexing, layout, and object metadata, which decide the space utilization. An efficient key-value cache needs to be effective at cache replacement, meanwhile having a high space utilization. Log/Segment-structured cache provides one way to achieve both, and this project looks into the challenges and opportunities of designing log/segment-structured key-value caches.
Yao Yue (ex-Twitter)
In-memory key-value cache traces:
https://github.com/pelikan-io/pelikan (production system)
We thank Twitter for allowing us to investigate the performance of production caches, and open-source the large cache trace dataset. The project is supported by a Facebook Fellowship, NSF CNS 1901410 and 1956271. The computation was performed on the PDL clusters, the Cloudlab testbed, the Chameleon testbed, and AWS.
We thank the members and companies of the PDL Consortium: Amazon, Google, Hitachi Ltd., Honda, Intel Corporation, IBM, Meta, Microsoft Research, Oracle Corporation, Pure Storage, Salesforce, Samsung Semiconductor Inc., Two Sigma, and Western Digital for their interest, insights, feedback, and support.