High Performance Direct Access NVM Storage Redundancy
Non-volatile memory (NVM) changes the way performance-sensitive applications interact with persistent data. NVM storage combines DRAM-like access latencies and granularities with disk-like durability. DIMM form-factor NVM storage resides on the memory bus with load/store accessible data that is read/written from/to CPU caches at a cache-line granularity. Applications can access NVM storage via conventional I/O interfaces or via direct-access (DAX). Conventional I/O interfaces, such as the file system, allow applications to use NVM without any modification. With DAX, applications map NVM data into their address space and access it directly with load and store instructions without interposed system software overheads.
Production storage demands more than just non-volatility and performance. A number of features that bolster data integrity are also expected. Whereas some features, e.g., background scrubbing, extend to NVM storage trivially, conventional redundancy mechanisms like page checksums and cross-page/cross-device parity fit poorly. Production storage systems maintain end-to-end page checksums, over and above on-device error correcting codes (ECCs), to protect data from loss/corruption due to firmware bugs such as lost writes and misdirected reads/writes. They also maintain cross-page or cross-device parity to recover data in case of a loss or corruption.
Maintaining page checksums and cross-page parity efficiently is particularly challenging with the DAX interface. The lack of interposed system software makes it challenging to identify data reads and updates that should trigger redundancy verification and updates, respectively. The fine-grained access granularity (e.g., 64-byte cache-lines) is incongruent with the typically large page size over which checksums are maintained (e.g., 4K pages) for space efficiency; this incongruence magnifies the overhead of checksum updates and verification.
We propose a lazy checksum and parity maintenance scheme, ANON, that creates a trade-off between time to coverage and performance. ANON’s periodic background thread updates redundancy for pages that an application wrote to in the last period. ANON repurposes page table entry dirty bits to identify such pages with stale redundancy. By delaying the redundancy updates, ANON amortizes the overhead of updating checksums and parity.
For applications that demand stronger data-integrity guarantees than ANON provides, such as in-line redundancy maintenance, we propose a hardware controller, Tvarak, interposed in the data path. Tvarak resides with the last-level cache controllers and provides efficient in-line redundancy maintenance. Tvarak leverages techniques like checksum and parity caching, and storing data-diffs (used for incremental checksum and parity updates) in the last level cache to reduce the impact of redundancy maintenance on NVM bandwidth. Tvarak has low dedicated resource requirements (e.g., a small on-Tvarak cache), and benefits from configurable fractions of shared system resources (e.g., last level cache).
Our ongoing work aims to reduce the cost of redundant NVM storage by storing parity on SSDs. We envision logging the parity writes in a small NVM region to avoid impacting application write performance and streaming this log in large sequential chunks to a SSD in the background.
- High Availability in Cheap Distributed Key Value Storage. Thomas Kim, Daniel Lin-Kit Wong, Gregory R. Ganger, Michael Kaminsky, David G. Andersen. SoCC ’20, October 19–21, 2020, Virtual Event, USA.
Abstract / PDF [2.6M]
- TVARAK: Software-Managed Hardware Offload for Redundancy in Direct-Access NVM Storage. Rajat Kateja, Nathan Beckmann, Greg Ganger. 47th International Symposium on Computer Architecture, May 30 – June 3, 2020, Virtual Valencia, Spain.
Abstract / PDF [1.6M]
- Vilamb: Low Overhead Asynchronous Redundancy for Direct Access NVM.
Rajat Kateja, Andy Pavlo, Greg Ganger.
Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-20-101, April 2020. Supersedes CMU-PDL-19-101.
Abstract / PDF [665K]
- TVARAK: Software-Managed Hardware Offload for DAX NVM Storage Redundancy. Rajat Kateja, Nathan Beckmann, Greg Ganger. Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-19-105, Aug 2019.
Abstract / PDF [975K]
- Lazy Redundancy for NVM Storage: Handing the Performance-Reliability Tradeoff to Applications. Rajat Kateja, Andy Pavlo, Greg Ganger Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-19-101, April 2019.
Abstract / PDF [800K]
We thank the members and companies of the PDL Consortium: Alibaba Group, Amazon, Datrium, Facebook, Google, Hewlett Packard Enterprise, Hitachi Ltd., Intel Corporation, IBM, Microsoft Research, NetApp, Inc., Oracle Corporation, Pure Storage, Salesforce, Samsung Semiconductor Inc., Seagate Technology, Two Sigma, and Western Digital for their interest, insights, feedback, and support.