PARALLEL DATA LAB 

PDL Abstract

Automation Without Predictability is a Recipe for Failure

Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-11-101. January 2011.

Raja R. Sambasivan & Gregory R. Ganger

Electrical and Computer Engineering
Carnegie Mellon University
Pittsburgh, PA 15213

http://www.pdl.cmu.edu/

Automated management seems a must, as distributed systems and datacenters continue to grow in scale and complexity. But, automation of performance problem diagnosis and tuning relies upon predictability, which in turn relies upon low variance—most automation tools aren't effective when variance is regularly high. This paper argues that, for automation to become a reality, system builders must treat variance as an important metric and make conscious decisions about where to reduce it. To help with this task, we describe a framework for understanding sources of variance and describe an example tool for helping identify them.

KEYWORDS: automated management, automation, autonomic computing, datacenters, distributed systems, performance problem diagnosis, predictability, variance

FULL TR: pdf