DATE: Thursday, November 11, 2004
TIME: Noon - 1 pm
PLACE: Wean Hall 8220

Jun (Jim) Xu
Georgia Tech

Network Data Streaming — A Computer Scientist's Journey in Signal Processing

Accurate traffic measurement and monitoring is critical for network management and operation. With the rapid growth of the Internet, network link speeds have become faster every year to accommodate more Internet users. To accurately measure and monitor these high-speed links becomes a very challenging problem. Data streaming has been proposed as a viable solution for measuring and monitoring high-speed links in large networks. Data streaming is concerned with processing a long stream of data items in one pass using a small working memory in order to answer a class of queries regarding the stream. While data streaming has been studied by the database researchers, most of their results can not be directly applied to network data streaming. A key design challenge, which makes most database streaming algorithms inapplicable, is that the online processing of each data item (packet) has to finish within tens of nanoseconds, which is orders of magnitude more stringent than in database applications (on the order of milliseconds).

To address this challenge, we developed a new methodology called "Lossy data structure + Bayesian statistics = Accurate streaming", and applied it to produce three results that are among the earliest in this area. The first result, based on our new streaming data structure called Space-Code Bloom Filter, solves the open problem of measuring per-flow traffic without keeping per-flow state. The second result is the design of the most resource-efficient and scalable IP traceback scheme to date, based on our new coincident streaming/sampling} technique. This work also pioneers the application of information theory to optimizing streaming data structures. The third result solves the open problem of accurately estimating flow size distribution without keeping per-flow state. Our algorithm is orders of magnitude more accurate than the traditional sampling-based approach.

We also discovered that the applications of data streaming go far beyond network measurement and monitoring. Recently, we have successfully applied the data streaming techniques to a seemingly unrelated area: query routing in an unstructured P2P network. In this talk, I will talk about these new applications as well as our theory of formulating the data streaming problems in CS as the channel design problem in EE.

Jun (Jim) Xu is an Assistant Professor in the College of Computing at Georgia Institute of Technology. He received his Ph.D. in Computer and Information Science from The Ohio State University in 2000. His current research interests include data streaming algorithms for the measurement and monitoring of computer networks, algorithms and data structures for computer networks, network security, and performance modeling and simulation. He received the NSF CAREER award in 2003 for his ongoing efforts in establishing fundamental lower bound and tradeoff results in networking. He is a co-author of a paper that won the Best Student Paper Award from 2004 ACM Sigmetrics/IFIP Performance joint conference, and the faculty advisor of the student winners.

For Further Seminar Info Contact:
or visit