PARALLEL DATA LAB 

PDL Abstract

Visualizing Request-flow Comparison to Aid Performance Diagnosis in Distributed Systems

Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-12-102. May 2012.

Raja R. Sambasivan, Ilari Shafer, Michelle L. Mazurek, Gregory R. Ganger

Electrical and Computer Engineering
Carnegie Mellon University
Pittsburgh, PA 15213

http://www.pdl.cmu.edu/

Distributed systems are complex to develop and administer, and performance problem diagnosis is particularly challenging. When performance decreases, the problem might be in any of the system's many components or could be a result of poor interactions among them. Recent research has provided the ability to automatically identify a small set of most likely problem locations, leaving the diagnoser with the task of exploring just that set. This paper describes and evaluates three approaches for visualizing the results of a proven technique called "request-flow comparison" for identifying likely causes of performance decreases in a distributed system. Our user study provides a number of insights useful in guiding visualization tool design for distributed system diagnosis. For example, we find that both an overlay-based approach (e.g., diff) and a side-by-side approach are effective, with tradeoffs for different users (e.g., expert vs. not) and different problem types. We also find that an animation-based approach is confusing and difficult to use.

KEYWORDS: distributed systems, performance diagnosis, request-flow comparison, user study, visualization

FULL TR: pdf