INTEL RESEARCH SEMINAR

DATE: Thursday, April 13, 2004
TIME: Noon - 1:30 pm
PLACE: Intel Seminar (417 S. Craig Street - 3rd Floor)
INTEL EVENTS PAGE: http://intel-research.net/pittsburgh/Seminars.asp

SPEAKER:
Mehul Shah
UC Berkeley

TITLE:
Flux: A Mechanism for Building Robust, Scalable Dataflows

ABSTRACT:
We present techniques for robustly scaling high-throughput, 24x7, data-stream processing applications. Examples of such applications include intrusion or denial-of-service detection, click-stream processing, and online analysis of financial quote streams. As part of the TelegraphCQ project, we implement these applications using a general-purpose continuous query (CQ) engine that executes long-running dataflows. To scale the performance of these dataflows, we parallelize them across a cluster of workstations.

For these applications, high availability, fault tolerance, and scalability are important goals. These goals are challenging to achieve on a cluster because machines are bound to fail, and load imbalances are likely to arise. We describe the design of Flux, a reusable communication abstraction that enables long-running parallel dataflows to adapt to these problems. Flux encapsulates mechanisms that allow a dataflow to mask faults and to automatically recover from them as they occur during execution. These same mechanisms are also used to periodically rebalance a dataflow and keep it running efficiently. Thus, by simply composing a parallel dataflow using Flux, an application developer can make the dataflow robust.

BIO:
Mehul Shah graduated from MIT in 1996 with undergraduate degrees in Physics and Computer Science. He received his MEng degree from the MIT EECS department in 1997. Currently, he is a member of the TelegraphCQ project at U.C. Berkeley. His research aims to provide high availability, fault tolerance, and load balancing for parallel continuous query (CQ) dataflows. His research interests include fault tolerance, CQ systems, adaptive query optimization, parallel data-intensive applications, distributed computing, and indexing techniques.

For Further Seminar Info:
Contact Kim Kaan, 412-605-1203, or visit http://www.intel-research.net.

SDI / LCS Seminar Questions?
Karen Lindenfelser, 86716, or visit www.pdl.cmu.edu/SDI/