PARALLEL DATA LAB 

PDL Abstract

Diamond: A Storage Architecture for Early Discard in Interactive Search

Proceedings of the 3rd USENIX Conference on File and Storage Technologies (FAST '04). San Francisco, CA. March 31, 2004.

Larry Huston,† Rahul Sukthankar,†° Rajiv Wickremesinghe,†‡ M. Satyanarayanan,†° Gregory R. Ganger,° Erik Riedel,* Anastassia Ailamaki°

† Intel Research Pittsburgh
° Carnegie Mellon University
‡ Duke University
* Seagate Research

Carnegie Mellon University
Pittsburgh, PA 15213

http://www.pdl.cmu.edu/

This paper explores the concept of early discard for interactive search of unindexed data. Processing data inside storage devices using downloaded searchlet code enables Diamond to perform efficient, application-specific filtering of large data collections. Early discard helps users who are looking for “needles in a haystack” by eliminating the bulk of the irrelevant items as early as possible. A searchlet consists of a set of application-generated filters that Diamond uses to determine whether an object may be of interest to the user. The system optimizes the evaluation order of the filters based on run-time measurements of each filter’s selectivity and computational cost. Diamond can also dynamically partition computation between the storage devices and the host computer to adjust for changes in hardware and network conditions. Performance numbers show that Diamond dynamically adapts to a query and to run-time system state. An informal user study of an image retrieval application supports our belief that early discard significantly improves the quality of interactive searches.

FULL PAPER: pdf / postscript