SPAA ’25, July 28–August 1, 2025, Portland, OR, USA.
Yiwei Zhao, Hongbo Kang*, Yan Gu^, Guy E. Blelloch, Laxman Dhulipala†, Charles McGuffey**, Phillip B. Gibbons
Carnegie Mellon University
*Tsinghua University
^University of California, Riverside
†University of Maryland
**Reed College
The kd-tree is a widely used data structure for managing multidimensional data. However, most existing 𝑘d-tree designs suffer from the memory wall—bottlenecked by off-chip memory latency and bandwidth limitations. Processing-in-memory (PIM), an emerging architectural paradigm, offers a promising solution to this issue by integrating processors (PIM cores) inside memory modules and offloading computational tasks to these PIM cores. This approach enables low-latency on-chip memory access and provides bandwidth that scales with the number of PIM modules, significantly reducing off-chip memory traffic.
This paper introduces PIM-kd-tree, the first theoretically grounded kd-tree design specifically tailored for PIM systems. The PIM-kd-tree is built upon a novel log-star tree decomposition that leverages local intra-component caching. In conjunction with other innovative techniques, including approximate counters with low overhead for updates, delayed updates for load balancing, and other PIMfriendly aspects, the PIM-kd-tree supports highly efficient batch-parallel construction, point searches, dynamic updates, orthogonal range queries, and kNN searches. Notably, all these operations are work-efficient and load-balanced even under adversarial skew, and incur only O(log* P) communication overhead (off-chip memory traffic) per query. Furthermore, we prove that our data structure achieves whp, an optimal trade-off between communication, space, and batch size. Finally, we present efficient parallel algorithms for two prominent clustering problems, density peak clustering and DBSCAN, utilizing the PIM-kd-tree and its techniques.
FULL PAPER: pdf