PARALLEL DATA LAB 

PDL Abstract

Mochi: Visual Log-Analysis Based Tools for Debugging Hadoop

Workshop on Hot Topics in Cloud Computing (HotCloud '09), San Diego, CA, on June 15, 2009. Supercedes Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-09-103, May 2009.

Jiaqi Tan, Xinghao Pan, Soila Kavulya, Rajeev Gandhi, Priya Narasimhan

Parallel Data Laboratory
School of Computer Science & Electrical and Computer Engineering
Carnegie Mellon University
Pittsburgh, PA 15213

http://www.pdl.cmu.edu/

Mochi, a new visual, log-analysis based debugging tool correlates Hadoop’s behavior in space, time and volume, and extracts a causal, unified control- and data-flow model of Hadoop across the nodes of a cluster. Mochi’s analysis produces visualizations of Hadoop’s behavior using which users can reason about and debug performance issues. We provide examples of Mochi’s value in revealing a Hadoop job’s structure, in optimizing real-world workloads, and in identifying anomalous Hadoop behavior, on the Yahoo! M45 Hadoop cluster.

KEYWRORDS: Visualization, Log analysis, Performance Debugging, Hadoop, MapReduce

FULL PAPER: pdf
FULL TR: pdf