NOTE SPECIAL DAY - FRIDAY, SEPT. 13, 2019
DATE:Friday, September 13, 2019
TIME: 12:00 - 1:00 pm
PLACE: Panther Hollow Conference Room, 4th Floor, RMCIC
SPEAKERS: Pat Helland,
Salesforce and Rohit Agrawal,
TALK 1: Write Amplification Versus Read Perspiration
This talk is a summary of my soon to be released column in ACM Queue titled “Write Amplification versus Read Perspiration”. In this short discussion, we observe that there is a strong pattern in which writing data incurs and obligation to do more work to make it easy to read that data later. We frequently talk about write amplification to describe the extra work we do in many cases to ease reading. We propose the nomenclature read perspiration to describe the challenges reading when we didn’t work hard to organize it.
Pat Helland has been implementing transaction systems, databases, application platforms, distributed systems, fault-tolerant systems, and messaging systems since 1978. For recreation, he occasionally writes technical papers. He currently works at Salesforce.
TALK 2: Compression Tradeoffs And Choices For An LSM Key Value Store
In this talk we discuss LSM compression for a KV store. In our KV store, we write to an underlying shared storage system that models data as named extents (up to 2GB) and variable-length fragments contained within the extent. Fragments are max of 1MB and are the atomic unit of read and write. Our KV store reads fragments into 64K buffers for scanning and random reads.
Our compression has two facets: key-compression and fragment-compression. Key-compression is particularly effective because the data we store in our LSM is very key-intensive, sometimes with key-only records stored. Effective key-compression dramatically improves the efficacy of our block cache. Fragment-compression is a powerful technique for us to save storage but the cost-benefit is proportional to the longevity of the compressed fragment leading to interesting tradeoffs with the LSM tree and its various levels.
Rohit Agrawal graduated from CMU in 2017. He is a software engineer at Salesforce and is currently focused on Database (Postgres) Internals in Transaction Processing and Storage. Interested in software systems in general that are reliable and can run efficiently at scale.
SEMINAR HOST: Andy Pavlo
SDI SEMINAR QUESTIONS?
Karen Lindenfelser, 86716, or visit www.pdl.cmu.edu/SDI/