Data Mining Meets Performance Evaluation: Fast Algorithms for Modeling Bursty Traffic
Appears in 18th International Conference on Data Engineering, 2002. Supercedes Carnegie Mellon University SCS Technical Report CMU-CS-01-101.
M. Wang, T. Madhyastha, N.H. Chan, S. Papadimitriou, C. Faloutsos
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213
Network, web, and disk I/O traffic are usually bursty, self-similar
and therefore can not be modeled adequately with Poisson arrivals. However,
we do want to model these types of traffic and to generate realistic
traces, because of obvious applications for disk scheduling, network
management, web server design. Previous models (like fractional Brownian
motion, FARIMA, etc.) tried to capture the burstiness. However,
the proposed models either require too many parameters to fit and/or
require prohibitively large (quadratic) time to generate large traces.
We propose a simple, parsimonious method, the b-model , which solves
both problems: It requires just one parameter, and it can easily generate
large traces. In addition, it has many more attractive properties: (a)
With our proposed estimation algorithm, it requires just a single pass
over the actual trace to estimate b. For example, a one-day-long disk
trace in milliseconds contains about 86Mb data points and requires about
3 minutes for model fitting and 5 minutes for generation. (b) The resulting
synthetic traces are very realistic: our experiments on real disk and
web traces show that our synthetic traces match the real ones very well
in terms of queuing behavior.