PARALLEL DATA LAB 

PDL Abstract

Giga+TableFS on PanFS: Scaling Metadata Performance on Cluster File Systems

Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-13-101. January 2013.

Kartik Kulkarni, Kai Ren, Swapnil Patil, Garth A. Gibson

School of Computer Science
Carnegie Mellon University

http://www.pdl.cmu.edu/

Modern File Systems provide scalable performance for large file data management. However, in case of metadata management the usual approach is to have single or few points of metadata service (MDS). In the current world, file systems are challenged by unique needs such as managing exponentially growing files, using filesystem as a key-value store, checkpointing that are highly metadata intensive and are usually bottlenecked by the centralized MDS schemes.

To overcome this metadata bottle-neck, we evaluate a scalable MDS layer for the existing cluster file systems using Giga+ -a high performance distributed index without synchronization and serialization and TableFS -a file system with an embedded No-SQL database using modern key-value pair levelDB. We take layered approach to scale the metadata performance which does not need any hardware infrastructure upgrade in the existing storage clusters. In addition to providing scalable and increased metadata performance by several folds, avoiding metadata hotspots, packing small files, our MDS layer adds no-or-low performance overhead on the data throughput and resource utilizations of the underlying cluster.

KEYWORDS: metadata scaling, metadata bottleneck, layered MDS, checkpointing, PanFS, Giga+, TableFS, No-SQL

FULL TR: pdf