PARALLEL DATA LAB 

PDL Abstract

Automated Diagnosis without Predictability is a Recipe for Failure

Proceedings of the 4th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '12), June 12-13, 2012, Boston, MA.

Raja R. Sambasivan & Gregory R. Ganger

Electrical & Computer Engineering
Carnegie Mellon University
Pittsburgh, PA 15213

http://www.pdl.cmu.edu/

Automated management is critical to the success of cloud computing, given its scale and complexity. But, most systems do not satisfy one of the key properties required for automation: predictability, which in turn relies upon low variance. Most automation tools are not ešective when variance is consistently high. Using automated performance diagnosis as a concrete example, this position paper argues that for automation to become a reality, system builders must treat variance as an important metric and make conscious decisions about where to reduce it. To help with this task, we describe a framework for reasoning about sources of variance in distributed systems and describe an example tool for helping identify them.

FULL PAPER: pdf