Speaker: Chris Newburn, CMU
Program Partitioning for a Multi-Instruction-Stream ArchitectureDate: March 13, 1997
Many of the next-generation desktop systems are symmetric multiprocessors (SMPs) with not one, but two or four processors. Such systems have traditionally been used either to support parallelism in a multiprogrammed workload or to speed up loop-intensive scientific applications. But often a user wants to execute a single non-loop-intensive application as quickly as possible. The user generally does not want to worry about how to parallelize the program or about programming in a special language. Current compilation technology is not able to do this automatically. There is a recent trend toward reduced synchronization overhead as a result of integrating multiple instruction streams onto a single chip, which in turn creates opportunities for exploiting parallelism at finer granularities.
In this talk, I will present four contributions that help solve the problem of parallelizing a single application on a system with multiple instruction streams. The first is a means of exposing the parallelism that is available in an application. The second is a new partitioning heuristic that exploits parallelism not only among individual instructions or loop iterations, but across loop nests, procedures, and collections of code constructs across a range of granularities. The choice of what kind and granularity parallelism to exploit is tailored for the particular application and target architecture. The third contribution is a set of synchronization techniques which support low overheads and effectively make cost trade-offs for a specified target architecture. The fourth contribution is an exploration of what architectural trade-offs can be made in order to best support available parallelism.
The compilation techniques described in this talk have been implemented in a post-pass parallelizing compiler called Pedigree. The Pedigree compiler has been used to tailor the available parallelism in several application domains to different architectures, such as VLIWs, symmetric multiprocessors, simultaneous multithreaded machines, XIMDs and MIMDs. In particular, the talk demonstrates how these new parallelization techniques can take advantage of the support for fine- and medium-grained parallelism.