DATE: Thursday, May 13, 2010
TIME: 12:00 pm - 1:00 pm
PLACE: Gates Center 8102

SPEAKER: Mr. Jiaqi Tan, Member of Technical Staff, DSO National Laboratories, Singapore

TITLE: Diagnosing Problems and Visualizing Behavior in MapReduce Systems

ABSTRACT:
The distributed nature and large scale of MapReduce programs and systems poses two challenges in using existing profiling and debugging tools to understand MapReduce programs. Existing tools produce too much information because of the large scale of MapReduce programs, and they do not expose program behaviors in terms of Maps and Reduces.

We have developed a novel non-intrusive log-analysis technique which extracts state- machine views of the control- and data-flows in MapReduce behavior from the native logs of Hadoop MapReduce systems, and it synthesizes these views to create a unified, causal view of MapReduce program behavior. This technique enables us to visualize MapReduce programs in terms of MapReduce-specific behaviors, greatly aiding operators in reasoning about and debugging performance problems in MapReduce systems in a scalable fashion. We validate our technique and visualizations using a real-world workload, showing how to understand the structure and performance behavior of MapReduce jobs, and diagnose injected performance problems reproduced from real-world problems.

BIO:
Jiaqi Tan is currently Member of Technical Staff at DSO National Laboratories, Singapore. Prior to joining DSO, he was a member of the Parallel Data Laboratory at Carnegie Mellon, where his research focused on problem diagnosis for and visualizing the behavior of MapReduce systems, particularly Hadoop. He had interned at Yahoo! in Sunnyvale, CA in Summer 2009 to transition diagnosis and visualization tools into the Hadoop Chukwa subproject. His current research interests are in diagnosing failures in cloud computing platforms, static-analysis techniques for finding bugs in software, and intrusion detection. He has an MS and BS in Computer Science from Carnegie Mellon.

 

Visitor Coordinator: Jennifer Engleson, 412-268-3729

SDI / LCS Seminar Questions?
Karen Lindenfelser, 86716, or visit www.pdl.cmu.edu/SDI/