PDL Abstract
Otus: Resource Attribution in Data-Intensive Clusters
MapReduce'11, June 8, 2011, San Jose, California, USA
Kai Ren, Julio López, Garth A. Gibson
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213
Frameworks for large scale data-intensive applications, such as Hadoop and Dryad, have gained tremendous popularity. Understanding the resource requirements of these frame- works and the performance characteristics of distributed ap- plications is inherently dicult. We present an approach, based on resource attribution, that aims at facilitating per- formance analyses of distributed data-intensive applications. This approach is embodied in Otus, a monitoring tool to attribute resource usage to jobs and services in Hadoop clusters. Otus collects and correlates performance metrics from distributed components and provides views that dis- play time-series of these metrics ltered and aggregated us- ing multiple criteria. Our evaluation shows that this ap- proach can be deployed without incurring major overheads. Our experience with Otus in a production cluster suggests its eectiveness at helping users and cluster administrators with application performance analysis and troubleshooting.
KEYWORDS: Resource Attribution, Metrics Correlation, Data-Intensive Systems, Monitoring.
FULL PAPER: pdf