Thursday, February 22, 2001
Noon - 1 pm
Hammerschlag Hall D210
On the Merits of the IRM Model as a Replacement
for Trace Driven
Simulations in System Design
The IRM model (independent reference model) is an ancient and trivial
model of user access patterns that describes non uniform random activity.
It is an order zero Markov chain and is consistent with exponential
interarrival times. The model's simplicity allows one to perform many
useful calculations, some of which will be noted in the talk,
assuming that it holds. Unfortunately it is thought by the vast majority
of system experts to be completely unrealistic. We will examine the IRM
model in the context of caching and more particularly caching in disk
array storage systems. As early as 1971, Aho Denning and Ullman produced
an optimal caching algorithm under the IRM model assumptions.
The algorithm itself, like the model, is completely trivial. It is so
trivial that it is probably the only caching algorithm that does not
necessitate a trace driven simulation in order to compute its effectiveness
(hit ratio) regardless of model assumptions. We will explain how the IRM
model can be slightly generalized to accomodate locality of reference and
that the optimal algorithm remains optimal for a cache enlargment problem
assuming that cache sizes are large. We then show that the major
technological trends towards higher capacity disk drives, larger caches,
higher bandwidth and mirrored SAN environments all work in favor of easing
the implementation barrier for the algorithm.
We then explain why an algorithm that behaves well under
the assumptions of the IRM model and which can be reasonably implemented does
not need to be validated by trace driven simulations which use real traces.
We will further argue that in many (but not all) cases trace driven
simulations with real traces using real equipment (or simulations
of real equipment) may be highly misleading and detrimental to
good system design and that mathematical theorems regarding abstract
systems which assume artificial but somewhat reasonable models are
in some sense better indicators of reality and are much more helpful.
Finally, despite the last paragraph, we will present results of trace
driven simulations taken with real traces that overwhelmingly
validate the usefulness of the IRM optimal caching algorithm.
Eitan Bachmat received his B.A. and M.A. in mathematics
from Hebrew U. and his Ph.D. in mathematics from M.I.T.
He has been a performance consultant with EMC Corp. for the last several
years. In that capacity, he has pioneered the performance group at EMC and
has been involved in the design of many of the performance related algorithms
in Symmetrix, EMC's main storage product. Since 1998 he is also a lecturer
in the CS Dept. at Ben Gurion U. in Israel. His current research
activity focuses mainly on storage systems. He is also interested in
number theory, algebraic geometry and bioinformatics.
For Further Seminar Info: