PARALLEL DATA LAB 

PDL Abstract

Otus: Resource Attribution in Data-Intensive Clusters

MapReduce'11, June 8, 2011, San Jose, California, USA

Kai Ren, Julio López, Garth A. Gibson

School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213

http://www.pdl.cmu.edu/

Frameworks for large scale data-intensive applications, such as Hadoop and Dryad, have gained tremendous popularity. Understanding the resource requirements of these frame- works and the performance characteristics of distributed ap- plications is inherently dicult. We present an approach, based on resource attribution, that aims at facilitating per- formance analyses of distributed data-intensive applications. This approach is embodied in Otus, a monitoring tool to attribute resource usage to jobs and services in Hadoop clusters. Otus collects and correlates performance metrics from distributed components and provides views that dis- play time-series of these metrics ltered and aggregated us- ing multiple criteria. Our evaluation shows that this ap- proach can be deployed without incurring major overheads. Our experience with Otus in a production cluster suggests its e ectiveness at helping users and cluster administrators with application performance analysis and troubleshooting.

KEYWORDS: Resource Attribution, Metrics Correlation, Data-Intensive Systems, Monitoring.

FULL PAPER: pdf