PARALLEL DATA LAB 

PDL Abstract

Metadata Efficiency in a Comprehensive Versioning File System

Carnegie Mellon University Technical Report CMU-CS-02-145, May 2002. Superceded by Second USENIX Conference on File and Storage Technologies. San Francisco, CA, Mar 31 - Apr 2, 2003.

Craig A. N. Soules, Garth R. Goodson, John D. Strunk, Gregory R. Ganger

Electrical and Computer Engineering
Carnegie Mellon University
Pittsburgh, PA 15213

http://www.pdl.cmu.edu/

A comprehensive versioning file system creates and retains a new file version for every WRITE or other modification request. The resulting history of file modifications provides a detailed view to tools and administrators seeking to investigate a suspect system state. Conventional versioning systems do not efficiently record the many prior versions that result. In particular, the versioned metadata they keep consumes almost as much space as the versioned data. This paper examines two space-efficient metadata structures for versioning file systems and describes their integration into the Comprehensive Versioning File System (CVFS). Journal-based metadata encodes each metadata version into a single journal entry; CVFS uses this structure for inodes and indirect blocks, reducing the associated space requirements by 80%. Multiversion b-trees extend the per-entry key with a timestamp and keep current and historical entries in a single tree; CVFS uses this structure for directories, reducing the associated space requirements by 99%. Experiments with CVFS verify that its current-version performance is similar to that of non-versioning file systems. Although access to historical versions is slower than conventional versioning systems, checkpointing is shown to mitigate this effect.

FULL PAPER, TR VERSION: pdf / postscript
FULL PAPER, CONFERENCE VERSION: pdf / postscript