PDL Abstract

DeltaFS: Exascale File Systems Scale Better Without Dedicated Servers

PDSW2015: 10th Parallel Data Storage Workshop, held in conjunction with SC15, Austin, TX, November 16, 2015.

Qing Zheng, Kai Ren, Garth A. Gibson, Bradley W. Settlemyer*, Gary Grider*

Carnegie Mellon University
*Los Alamos National Laboratory

High performance computing fault tolerance depends on scalable parallel file system performance. For more than a decade scalable bandwidth has been available from the object storage systems that underlie modern parallel file systems, and recently we have seen demonstrations of scalable parallel metadata using dynamic partitioning of the namespace over multiple metadata servers. But even these scalable parallel file systems require significant numbers of dedicated servers, and some workloads still experience bottlenecks. We envision exascale parallel file systems that do not have any dedicated server machines. Instead a parallel job instantiates a file system namespace service in client middleware that operates on only scalable object storage and communicates with other jobs by sharing or publishing namespace snapshots. Experiments shows that our serverless file system design, DeltaFS, performs metadata operations orders of magnitude faster than traditional file system architectures.