PARALLEL DATA LAB 

PDL Abstract

Categorizing and Differencing System Behaviours

Second Workshop on Hot Topics in Autonomic Computing. June 15, 2007. Jacksonville, FL.

Raja R. Sambasivan, Alice X. Zheng, Eno Thereska, Gregory R. Ganger

Dept. Electrical and Computer Engineering
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213

http://www.pdl.cmu.edu/

Making request flow tracing an integral part of software systems creates the potential to better understand their operation. The resulting traces can be converted to perrequest graphs of the work performed by a service, representing the flow and timing of each request’s processing. Collectively, these graphs contain detailed and comprehensive data about the system’s behavior and the workload that induced it, leaving the challenge of extracting insights. Categorizing and differencing such graphs should greatly improve our ability to understand the runtime behavior of complex distributed services and diagnose problems. Clustering the set of graphs can identify common request processing paths and expose outliers. Moreover, clustering two sets of graphs can expose differences between the two; for example, a programmer could diagnose a problem that arises by comparing current request processing with that of an earlier non-problem period and focusing on the aspects that change. Such categorizing and differencing of system behavior can be a big step in the direction of automated problem diagnosis.

FULL WORKSHOP PAPER: pdf