Ganesha: Black-Box Fault Diagnosis for MapReduce Systems
Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-08-112, September 2008.
Xinghao Pan, Jiaqi Tan, Soila Kavulya, Rajeev Gandhi, Priya Narasimhan
Parallel Data Laboratory
School of Computer Science & Electrical and Computer Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
Ganesha aims to diagnose faults transparently in MapReduce systems, by analyzing OS-level metrics alone. Ganesha’s approach is based on peer-symmetry under fault-free conditions, and can diagnose faults that manifest asymmetrically at nodes within a MapReduce system. While our training is performed on smaller Hadoop clusters and for specific workloads, our approach allows us to diagnose faults in larger Hadoop clusters and for unencountered workloads. We also candidly highlight faults that escape Ganesha’s black-box diagnosis.
KEYWORDS: Hadoop, Fault diagnosis, Clustering
FULL TR: pdf