Date: October 15, 1998
Time: Noon
Place: Wean Hall 8220

Speaker: Tom M. Kroeger, Univ. of California, Santa Cruz; Carnegie Mellon University

Modeling File Reference Patterns to Improve Caching Decisions

Abstract:
Many I/O systems benefit significantly from prefetching at differnt levels. For disk controllers this is usually done by prefetching the next sequential block, within a file it is usually the next sequential data. However, the abstraction of a file has no intrinsic concept of a common successor. We have shown that files do have relationships, and believe that these relationships exist because of their function. For example in a Unix file system, when the executable for the program make is referenced, it is likely that references to files such as cc, as, and ld will soon follow. There is significant information to be gained from the modeling of file access patterns. However, to successfully track the reference patterns of a file system, a model must efficiently handle the large number of distinct files, as well as the constant addition and removal of files. To further complicate matters, the permanent nature of a file system requires that a model be able to run indefinitely, forcing any realistic model to address issues of long-term model buildup and adapting to changing patterns.

We have adapted a multi-order context modeling technique used in the text compression method Prediction by Partial Match (PPM) to track file reference patterns. We then further modified these modeling techniques to address the issues of model space, computational complexity and adapting to changing patterns. The result is that model space requirements are reduced from O(a^m) to O(a) (where a is the number of objects modeled, and m is the model order, or tree depth). The model RAM requirements are then further reduced by paging each partition with the file in it's first order node. In trace driven simulations we saw a 4 megabyte cache with predictive prefetching receive as many cache hits as a 90 megabyte LRU cache without prefetching.

We are currently working to implement a predictive prefetching agent that uses such models to prefetch file data in Linux, as well as continuing to investigate effective models for file reference patterns, and how they can be used to improve caching.

SDI / LCS Seminar Questions?
Karen Lindenfelser, 86716, or visit www.pdl.cmu.edu/SDI/