PARALLEL DATA LAB 

PDL Abstract

Capturing the Spatio-Temporal Behavior of Real Traffic Data

Performance 2002 (IFIP Int. Symp. on Computer Performance Modeling, Measurement and Evaluation), Rome, Italy, Sept. 2002.

Mengzhi Wang, Anastassia Ailamaki, and Christos Faloutsos

School of Computer Science
Carnegie Mellon University
5000 Forbes Ave.
Pittsburgh, PA 15213

http://www.pdl.cmu.edu/

Traffic, like disk and memory accesses, typically exhibits burstiness, temporal locality and spatial locality. There is much recent ground-breaking work on temporal modeling (self-similarity, etc.), on disk and web traffic, with several statistical models that generate realistic series of time-stamps. However, no work generates realistic traces for both time and location (e.g., block-id). In fact, except for qualitative speculations, it is not even known whether/how the time-stamps are correlated with the locations, nor how to measure this correlation, let alone how to reproduce it realistically.

These are exactly the problems we solve here: (a) We propose the 'entropy plots' to quantify the spatial/temporal correlation (or lack of it), and (b) we propose a new model, the 'PQRS' model, that captures all the characteristics of real spatio-temporal traffic. Our model can generate traffic that is bursty (or uniform) on time; bursty or uniform on space; and it can mimic the correlation between space and time, whenever such correlation exists. Moreover, it requires very few parameters (p, q, r, and the grand total of disk/memory accesses); and it has linearscalability in computing these parameters. Experiments with multiple real data sets (disk traces from HP Labs, TPC-C memory traces), show that our model can mimic real traces very well, while the only obvious alternative, the independence assumption, leads to more than 60x worse error.

FULL PAPER: pdf