ABSTRACT

    SOSP'05, October 23–26, 2005, Brighton, United Kingdom. Supercedes Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-05-105, April 2005.

    Connections: Using Context to Enhance File Search

    Craig A.N. Soules, Gregory R. Ganger

    Parallel Data Laboratory, Carnegie Mellon University.
    Pittsburgh, PA 15213

    http://www.pdl.cmu.edu/

    Connections is a file system search tool that combines traditional content-based search with context information gathered from user activity. By tracing file system calls, Connections can identify temporal relationships between files and use them to expand and reorder traditional content search results. Doing so improves both recall (reducing falsepositives) and precision (reducing false-negatives). For example, Connections improves the average recall (from 13% to 22%) and precision (from 23% to 29%) on the first ten results. When averaged across all recall levels, Connections improves precision from 17% to 28%. Connections provides these benefits with only modest increases in average query time (2 seconds), indexing time (23 seconds daily), and index size (under 1% of the user's data set).

    KEYWORDS: file search, contextual search, successor models

    FULL PAPER (CONFERENCE VERSION): pdf
    FULL PAPER (TR VERSION): pdf


    PDL Home Publications Home

    © 2008.
    Last updated 30 August, 2005