Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-13-107. May 2013.
Kai Ren, Swapnil Patil, Kartik Kulkarni, Adit Madan, Garth A. Gibson
School of Computer Science
Carnegie Mellon University
Modern parallel and cluster file systems provide highly scalable I/O bandwidth by enabling highly parallel access to file data. Unfortunately metadata access does not benefit from parallel data transfer, so metadata performance scaling is less common. To support metadata-intensive workloads, we offer a middleware design that layers on top of existing cluster file systems, adds support for load balanced and high-performance metadata operations without sacrificing data bandwidth. The core idea is to integrate a distributed indexing mechanism with a metadata optimized on-disk Log-Structured Merge tree layout. The integration requires several optimizations including cross-server split operations with minimum data migration, and decoupling of data and metadata paths. To demonstrate the feasibility of our approach, we implemented a prototype middleware layer GIGA+TableFS and evaluated it with a Panasas parallel file system. GIGA+TableFS improves metadata performance of PanFS by as much an order of magnitude, while still performing comparably on data-intensive workloads.
KEYWORDS: Parallel File System, Metadata, Load Balance, Log-Structured Merge Tree
FULL TR: pdf