PARALLEL DATA LAB 

PDL Abstract

Attribute-Based Prediction of File Properties

Harvard Computer Science Group Technical Report TR-14-03, December 2003.

Daniel Ellard* , Michael Mesnier‡, Eno Thereska† , Gregory R. Ganger† , Margo Seltzer*

* Harvard University Division of Engineering and Applied Sciences
Computer Science Group
Harvard University
Cambridge, MA

†Parallel Data Laboratory, Carnegie Mellon University.
‡ Intel Corporation and Parallel Data Laboratory, Carnegie Mellon University.
Carnegie Mellon University
Pittsburgh, PA 15213

http://www.pdl.cmu.edu/

We present evidence that attributes that are known to the file system when a file is created, such as its name, permission mode, and owner, are often strongly related to future properties of the file such as its ultimate size, lifespan, and access pattern. More importantly, we show that we can exploit these relationships to automatically generate predictive models for these properties, and that these predictions are sufficiently accurate to enable optimizations.

FULL PAPER: pdf / postscript