Parallel Data Laboratory

PDL Abstract

Scheduling Threads for Constructive Cache Sharing on CMPs

SPAA'07, June 9-11, 2007, San Diego, California, USA.

Shimin Chen** Phillip B. Gibbons** Michael Kozuch** Vasileios Liaskovitis* Anastassia Ailamaki* Guy E. Blelloch* Babak Falsafi* Limor Fix** Nikos Hardavellas* Todd C. Mowry*;** Chris Wilkerson†

*Carnegie Mellon University
**Intel Research Pittsburgh
†Intel Microprocessor Research Lab

http://www.pdl.cmu.edu/

In chip multiprocessors (CMPs), limiting the number of off- chip cache misses is crucial for good performance. Many multithreaded programs provide opportunities for construc tive cache sharing, in which concurrently scheduled threads share a largely overlapping working set. In this paper, we compare the performance of two state-of-the-art schedulers proposed for fine-grained multithreaded programs: Parallel Depth First (PDF), which is specifically designed for constructive cache sharing, and Work Stealing (WS), which is a more traditional design. Our experimental results indicate that PDF scheduling yields a 1.3-1.6X performance improvement relative to WS for several fine-grain parallel benchmarks on projected future CMP configurations; we also report several issues that may limit the advantage of PDF in certain applications. These results also indicate that PDF more effectively utilizes off-chip bandwidth, making it possible to trade-off on-chip cache for a larger number of cores. Moreover, we find that task granularity plays a key role in cache performance. Therefore, we present an automatic approach for selecting effective grain sizes, based on a new working set profiling algorithm that is an order of magnitude faster than previous approaches. This is the first paper demonstrating the effectiveness of PDF on real benchmarks, providing a direct comparison between PDF and WS, revealing the limiting factors for PDF in practice, and presenting an approach for overcoming these factors.

KEYWORDS: Chip Multiprocessors, Scheduling Algorithms, Constructive Cache Sharing, Parallel Depth First, Work Stealing, Thread Granularity, Working Set Profiling

FULL PAPER: pdf

PARALLEL DATA LAB

PDL Publications

PDL Abstract

Scheduling Threads for Constructive Cache Sharing on CMPs

Contact us

Recent Events

PDL Retreat 2023

PDL Retreat 2022

PDL Visit Day 2022

Social Media