PARALLEL DATA LAB 

PDL Abstract

Using Provenance to Aid in Personal File Search

USENIX '07 Annual Technical Conference, Santa Clara, CA, June 17–22, 2007.

Sam Shah* Craig A. N. Soules† Gregory R. Ganger‡ Brian D. Noble*

*University of Michigan
†HP Labs
‡Carnegie Mellon University

Parallel Data Laboratory, Carnegie Mellon University.
Pittsburgh, PA 15213

http://www.pdl.cmu.edu/

As the scope of personal data grows, it becomes increasingly difficult to find what we need when we need it. Desktop search tools provide a potential answer, but most existing tools are incomplete solutions: they index content, but fail to capture dynamic relationships from the user’s context. One emerging solution to this is context-enhanced search, a technique that reorders and extends the results of content-only search using contextual information. Within this framework, we propose using strict causality, rather than temporal locality, the current state of the art, to direct contextual searches. Causality more accurately identifies data flow between files, reducing the false-positives created by context-switching and back-ground noise. Further, unlike previous work, we conduct an online user study with a fully-functioning implementation to evaluate user-perceived search quality directly. Search results generated by our causality mechanism are rated a statistically-significant 17% higher on average over all queries than by using content-only search or context-enhanced search with temporal locality.

FULL PAPER: pdf