PARALLEL DATA LAB 

PDL Abstract

Active Disks for Large-Scale Data Processing

Appears in IEEE Computer, June 2001.

Erik Riedel*, Christos Faloutsos, Garth A. Gibson, David Nagle

*Seagate Technology, Pgh. PA

School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213

http://www.pdl.cmu.edu/Active

As processor performance increases and memory cost decreases, system intelligence continues to move away from the CPU and into peripherals. Storage system designers use this trend toward excess computing power to perform more complex processing and optimizations inside storage devices. To date, such optimizations take place at relatively low levels of the storage protocol. Trends in storage density, mechanics, and electronics eliminate the hardware bottleneck and put pressure on interconnects and hosts to move data more efficiently.

We propose using an active disk storage device that combines on-drive processing and memory with software downloadability to allow disks to execute application- level functions directly at the device. Moving portions of an application's processing to a storage device significantly reduces data traffic and leverages the parallelism already present in large systems, dramatically reducing the execution time for many basic data mining tasks.

FULL PAPER: pdf