PARALLEL DATA LAB 

PDL Talk Series 2021

June 10, 2021


TIME
: 12:00 noon - to approximately 1:00 pm EDT
PLACE: Virtual - a zoom link will be emailed closer to the seminar

SPEAKER: Nathan Beckmann
Assistant Professor, Computer Science Dept., CMU


Making Data Access Faster and Cheaper via Ubiquitous Flash Caching
Caches are critical to achieving good performance in datacenter applications. However, as data sizes continue to grow, caches themselves have grown to the point where storing them in DRAM is very expensive. There is a huge opportunity to save cost and energy by shifting caches to media like flash, but doing so comes with a host of challenges. This talk will cover recent and ongoing work in the Parallel Data Lab at CMU that solves these problems to enable flash caching even on the most challenging workloads. I will describe the CacheLib library, developed at Facebook, that makes it easy to spin-up new hybrid (i.e., DRAM + flash) caches and consolidates best practices in cache design. CacheLib is widely deployed at Facebook, and this has led to several important lessons learned from several years of production. I will then discuss two ongoing projects that aim to make flash caches widely applicable. First, flash caches add a new dimension to cache design because of their limited write budget -- i.e., the number of bytes that can be written without wearing out the device. We are designing new cache admission policies that "spend writes wisely" using a combination of algorithmic analysis and machine learning. Second, Kangaroo is a flash cache optimized for billions of tiny objects, an adversarial workload that requires either prohibitive DRAM size or flash write rate on traditional flash-cache designs. Kangaroo combines existing cache designs in a new way to improve hit ratios & reduce cost, while limiting DRAM and flash writes.

BIO: I am an assistant professor in the Computer Science Department and (by courtesy) the Electrical and Computer Engineering Department at Carnegie Mellon University. My work is about reducing data movement in computer systems, spanning datacenters to the Internet of Things, by keeping data closer to where it is needed. I earned my PhD from MIT in 2015 under the supervision of Daniel Sanchez. My awards include the George M Sprowls Award for “outstanding PhD thesis in Computer Science at MIT”, the NSF CAREER Award, Google Faculty Research Awards in 2017 and 2019, and the Google Research Scholar Award in 2021.


CONTACTS


Director, Parallel Data Lab
VOICE: (412) 268-1297


Executive Director, Parallel Data Lab
VOICE: (412) 268-5485


PDL Administrative Manager
VOICE: (412) 268-6716