MultiMap: Preserving Disk Locality for Multidimensional Datasets
IEEE 23rd International Conference on Data Engineering (ICDE 2007) Istanbul, Turkey, April 2007. Supercedes Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-05-102. March 2005.
Minglong Shao*, Steven W. Schlosser†, Stratos Papadomanolakis*, Jiri Schindler‡,
Anastassia Ailamaki*,
Gregory R. Ganger*
* Carnegie Mellon University
†Intel Research Pittsburgh
‡ Network Appliance, Inc.
http://www.pdl.cmu.edu/
MultiMap is an algorithm for mapping multidimensional
datasets so as to preserve the data’s spatial locality on
disks. Without revealing disk-specific details to applications,
MultiMap exploits modern disk characteristics to provide
full streaming bandwidth for one (primary) dimension
and maximally efficient non-sequential access (i.e., minimal
seek and no rotational latency) for the other dimensions.
This is in contrast to existing approaches, which
either severely penalize non-primary dimensions or fail to
provide full streaming bandwidth for any dimension. Experimental
evaluation of a prototype implementation demonstrates
MultiMap’s superior performance for range and
beam queries. On average,MultiMap reduces total I/O time
by over 50% when compared to traditional linearized layouts
and by over 30%when compared to space-filling curve
approaches such as Z-ordering and Hilbert curves. For
scans of the primary dimension, MultiMap and traditional
linearized layouts provide almost two orders of magnitude
higher throughput than space-filling curve approaches.
FULLCONFERENCE PAPER: pdf
FULL TR: pdf