DATE: Thursday, February 22, 2001
TIME: Noon - 1 pm
PLACE: Hammerschlag Hall D210

Eitan Bachmat

EMC Corporation

On the Merits of the IRM Model as a Replacement for Trace Driven
Simulations in System Design

The IRM model (independent reference model) is an ancient and trivial model of user access patterns that describes non uniform random activity. It is an order zero Markov chain and is consistent with exponential interarrival times. The model's simplicity allows one to perform many useful calculations, some of which will be noted in the talk, assuming that it holds. Unfortunately it is thought by the vast majority of system experts to be completely unrealistic. We will examine the IRM model in the context of caching and more particularly caching in disk array storage systems. As early as 1971, Aho Denning and Ullman produced an optimal caching algorithm under the IRM model assumptions. The algorithm itself, like the model, is completely trivial. It is so trivial that it is probably the only caching algorithm that does not necessitate a trace driven simulation in order to compute its effectiveness (hit ratio) regardless of model assumptions. We will explain how the IRM model can be slightly generalized to accomodate locality of reference and that the optimal algorithm remains optimal for a cache enlargment problem assuming that cache sizes are large. We then show that the major technological trends towards higher capacity disk drives, larger caches, higher bandwidth and mirrored SAN environments all work in favor of easing the implementation barrier for the algorithm.

We then explain why an algorithm that behaves well under the assumptions of the IRM model and which can be reasonably implemented does not need to be validated by trace driven simulations which use real traces.

We will further argue that in many (but not all) cases trace driven simulations with real traces using real equipment (or simulations of real equipment) may be highly misleading and detrimental to good system design and that mathematical theorems regarding abstract systems which assume artificial but somewhat reasonable models are in some sense better indicators of reality and are much more helpful.

Finally, despite the last paragraph, we will present results of trace driven simulations taken with real traces that overwhelmingly validate the usefulness of the IRM optimal caching algorithm.

Eitan Bachmat received his B.A. and M.A. in mathematics from Hebrew U. and his Ph.D. in mathematics from M.I.T. He has been a performance consultant with EMC Corp. for the last several years. In that capacity, he has pioneered the performance group at EMC and has been involved in the design of many of the performance related algorithms in Symmetrix, EMC's main storage product. Since 1998 he is also a lecturer in the CS Dept. at Ben Gurion U. in Israel. His current research activity focuses mainly on storage systems. He is also interested in number theory, algebraic geometry and bioinformatics.

For Further Seminar Info: