Contact: Greg Ganger

Data Mining meets Traffic Modeling

Traffic modeling of storage workloads is extremely helpful in evaluating system designs. The work involves the following two aspects. The first is to discover and to quantify the most important features of the traffic data. Two example features are temporal burstiness and spatial locality. In addition, it's even harder to determine how these features affect the performance of the traffic data in real systems. Secondly, we need an efficient statistical model to generate synthetic workloads of similar behavior as the real ones. Traditional models such as Poisson are inadequate in generating timestamps for traffic data of strong burstiness, not mentioning generating multi-dimensional traffic.

This project is to solve the above problem. Our previous work has focused on the spatio-temporal behavior of traffic data, more specifically, the temporal burstiness and spatial locality of I/O workload. Our proposed tool, entropy plot, is able to quantify the temporal burstiness and spatial locality in traffic data. The B-model generates the timestamps for the synthetic traffic to imitate the temporal burstiness of real traffic data. The PQRS model goes one step further by generating both the timestamps and request locations for synthetic traces. The ongoing work is to augment the model to deal with more dimensionality.

2- and 3-dimensional representations of real traffic data showing burstiness along time and space.



Anastassia Ailamaki
Christos Faloutsos


Mengzhi Wang



We thank the members and companies of the PDL Consortium: Alibaba Group, Amazon, Datrium, Facebook, Google, Hewlett Packard Enterprise, Hitachi Ltd., Intel Corporation, IBM, Micron, Microsoft Research, NetApp, Inc., Oracle Corporation, Salesforce, Samsung Semiconductor Inc., Seagate Technology, and Two Sigma for their interest, insights, feedback, and support.




© 2019. Legal Info.
Last updated 15 March, 2012