ABSTRACT
SPAA'07, June 9-11, 2007, San Diego, California, USA.
Scheduling Threads for Constructive Cache Sharing on CMPs
Shimin Chen** Phillip B. Gibbons** Michael Kozuch** Vasileios Liaskovitis* Anastassia Ailamaki* Guy E. Blelloch* Babak Falsafi* Limor Fix** Nikos Hardavellas* Todd C. Mowry*;** Chris Wilkerson†
*Carnegie Mellon University
**Intel Research Pittsburgh
†Intel Microprocessor Research Lab
http://www.pdl.cmu.edu/
In chip multiprocessors (CMPs), limiting the number of off-
chip cache misses is crucial for good performance. Many
multithreaded programs provide opportunities for construc
tive cache sharing, in which concurrently scheduled threads
share a largely overlapping working set. In this paper, we
compare the performance of two state-of-the-art schedulers
proposed for fine-grained multithreaded programs: Parallel
Depth First (PDF), which is specifically designed for constructive cache sharing, and Work Stealing (WS), which is
a more traditional design. Our experimental results indicate that PDF scheduling yields a 1.3-1.6X performance
improvement relative to WS for several fine-grain parallel
benchmarks on projected future CMP configurations; we
also report several issues that may limit the advantage of
PDF in certain applications. These results also indicate that
PDF more effectively utilizes off-chip bandwidth, making it
possible to trade-off on-chip cache for a larger number of
cores. Moreover, we find that task granularity plays a key
role in cache performance. Therefore, we present an automatic approach for selecting effective grain sizes, based
on a new working set profiling algorithm that is an order
of magnitude faster than previous approaches. This is the
first paper demonstrating the effectiveness of PDF on real
benchmarks, providing a direct comparison between PDF
and WS, revealing the limiting factors for PDF in practice,
and presenting an approach for overcoming these factors.
KEYWORDS: Chip Multiprocessors, Scheduling Algorithms, Constructive
Cache Sharing, Parallel Depth First, Work Stealing, Thread
Granularity, Working Set Profiling
FULL PAPER: pdf


©
2008.
Last updated
3 April, 2007
|