ABSTRACT

    IEEE 23rd International Conference on Data Engineering (ICDE 2007) Istanbul, Turkey, April 2007. Supercedes Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-05-102. March 2005.

    MultiMap: Preserving Disk Locality for Multidimensional Datasets

    Minglong Shao*, Steven W. Schlosser†, Stratos Papadomanolakis*, Jiri Schindler‡, Anastassia Ailamaki*,
    Gregory R. Ganger*

    * Carnegie Mellon University
    †Intel Research Pittsburgh
    ‡ Network Appliance, Inc.

    http://www.pdl.cmu.edu/

    MultiMap is an algorithm for mapping multidimensional datasets so as to preserve the data’s spatial locality on
    disks. Without revealing disk-specific details to applications, MultiMap exploits modern disk characteristics to provide full streaming bandwidth for one (primary) dimension and maximally efficient non-sequential access (i.e., minimal
    seek and no rotational latency) for the other dimensions. This is in contrast to existing approaches, which either severely penalize non-primary dimensions or fail to provide full streaming bandwidth for any dimension. Experimental evaluation of a prototype implementation demonstrates MultiMap’s superior performance for range and beam queries. On average,MultiMap reduces total I/O time by over 50% when compared to traditional linearized layouts and by over 30%when compared to space-filling curve approaches such as Z-ordering and Hilbert curves. For scans of the primary dimension, MultiMap and traditional linearized layouts provide almost two orders of magnitude higher throughput than space-filling curve approaches.

    FULLCONFERENCE PAPER: pdf
    FULL TR: pdf


    PDL Home Publications Home

    © 2008.
    Last updated 4 April, 2007