DATE: Thursday, February 8 , 2007
TIME: 12:00 pm - 1:00 pm
PLACE: CIC 2101

SPEAKER:
Alice Zheng
CMU

TITLE:
Statistical Failure Diagnosis in Software and Systems

ABSTRACT:
As software and systems become increasingly complex, the task of debugging also becomes increasingly difficult. Manual diagnosis can require sifting through millions of lines of code and output logs. In addition, large systems often contain many components, each complex on its own, and often interacting in unexpected ways.

In this talk, I give two examples of how statistical machine learning algorithms, along with appropriate instrumentation, can aid in failure diagnosis. The first example is an automatic software debugger that collects information from past successes and failures to locate suspicious program predicates. The data is obtained via fine-grained instrumentation of the program. We demonstrate a bi-clustering algorithm that is effective at simultaneously clustering failed runs and selecting useful predicates in several real-world programs.

The second example comes from performance diagnosis in a distributed file system. We obtain snapshots of the system that contain coarse-grained traces of each file access request. We show that standard clustering techniques can separate requests into meaningful categories and pinpoint the key differences between snapshots.

Work on the software debugger done in collaboration with Ben Liblit (U. Wisconsin, Madison), Michael Jordan (U.C. Berkeley), Alex Aiken and Mayur Naik (Stanford). Work on performance diagnosis is a collaborative effort with Raja Sambasivan and Greg Ganger (CMU).

BIO:
Alice Zheng received her Ph.D. from UC Berkeley in 2005 and is currently a postdoctoral fellow in the Parallel Data Laboratory at Carnegie Mellon University. Her interests lie in applied machine learning, in particular to computer systems, software, and networks. Current projects include statistical software debugging, performance diagnosis of distributed file systems, efficient internet traffic measurements, and modeling social networks.

 

SDI / LCS Seminar Questions?
Karen Lindenfelser, 86716, or visit www.pdl.cmu.edu/SDI/