Proceedings of the 3rd USENIX Conference on File and Storage Technologies (FAST '04). San Francisco, CA. March 31, 2004.
Larry Huston, Rahul Sukthankar,° Rajiv Wickremesinghe,
M. Satyanarayanan,° Gregory R. Ganger,° Erik Riedel,* Anastassia Ailamaki°
Intel Research Pittsburgh
° Carnegie Mellon University
Duke University
* Seagate Research
Carnegie Mellon University
Pittsburgh, PA 15213
http://www.pdl.cmu.edu/
This paper explores the concept of early discard for interactive search
of unindexed data. Processing data inside storage devices using downloaded
searchlet code enables Diamond to perform efficient, application-specific
filtering of large data collections. Early discard helps users who are
looking for needles in a haystack by eliminating the bulk
of the irrelevant items as early as possible. A searchlet consists of
a set of application-generated filters that Diamond uses to determine
whether an object may be of interest to the user. The system optimizes
the evaluation order of the filters based on run-time measurements of
each filters selectivity and computational cost. Diamond can also
dynamically partition computation between the storage devices and the
host computer to adjust for changes in hardware and network conditions.
Performance numbers show that Diamond dynamically adapts to a query
and to run-time system state. An informal user study of an image retrieval
application supports our belief that early discard significantly improves
the quality of interactive searches.
FULL PAPER: pdf / postscript