PARALLEL DATA LAB 

PDL Abstract

RAIZN: Redundant Array of Independent Zoned Namespaces

Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-22-101, January 2022. Superceded by ASPLOS ’23, March 25–29, 2023, Vancouver, BC, Canada.

Thomas Kim*, George Amvrosiadis*, Jekyeom Jeon*, Huaicheng Li*, David G. Andersen*†, Greg Ganger*,
Michael Kaminsky*†, and Matias Bjørling‡

* Carnegie Mellon University
† BrdgAi
‡ Western Digital Corporation

http://www.pdl.cmu.edu/

Zoned Namespace (ZNS) SSDs are the most recent evolution of host-managed flash-based storage, enabling improved performance at a lower cost-per-byte compared to traditional block interface SSDs. To date, there is no support for arranging these new devices in redundant arrays (RAID), which may limit their deployment in environments where this is the favored mechanism for increasing reliability and throughput. This paper identifies key challenges in the design of a RAID-like mechanism for ZNS SSDs, such as the requirement to manage metadata updates and persist partial stripe writes in the absence of overwrite semantics in the device’s interface. We present the design, implementation, and evaluation of RAIZN, a logical volume manager that exposes a ZNS interface and stripes data and parity across ZNS SSDs.

Experiments show that RAIZN provides full expected performance from the aggregate device set, successfully addressing the key challenges from the ZNS interface. RAIZN achieves throughput and latency comparable to the equivalent Linux software RAID implementation running on conventional SSDs that use the same hardware platform, and then RAIZN exceeds its performance once device-level garbage collection inhibits the conventional SSDs. Importantly, RAIZN retains ZNS’s opportunities for increased application performance, allowing higher-level software (e.g., F2FS or RocksDB) to carefully control garbage collection. This allows, for example, RAIZN to maintain consistent performance under scenarios where conventional SSD arrays experience up to 87.5% throughput drop due to device-level garbage collection.

KEYWORDS: Storage, Reliability, Zoned Storage, ZNS, RAID

FULL TR: pdf
CONFERENCE VERSION: pdf