Date: February 17, 1994

Speaker: Thomas Stricker

A Message Passing Architecture for Compiled Parallel Programs

Abstract:
Modern distributed memory parallel computers offer interconnects speeds of 100-300 Megabytes per second (Paragon, Cray T3D). Presently the communication architecture is not able to sustain even small fractions of that speed in communication patterns useful for parallel computations. In particular it is not just the asymptotic "maximal" speed for long data streams that counts, but also how do the communication primitives scale to complex data exchanges with smaller data entities.

The message passing system of current parallel computers supports a rich set of functions (typically called services OS related software systems). As a result, inter-processor communication incurs a high overhead even for simple data transfers. Many functions of a message passing library are useful and necessary if each node is programmed individually with a local view at just the node-program.

The back end of a parallelizing compiler has no need for much of this message passing functionality since a compiler generated code executes with a global, system-wide view. The compiler has access to information about the location of the data, the distribution of work and the timing relationships within the execution. It can use this information to implement some costly functions of a standard message passing system in a more efficient way. For example, the compiler back end can directly manage message buffering instead of relying on the buffer allocation mechanism built into the message passing system. It can also use barrier synchronization services that are separate from data transfers. Furthermore, compilers need to transfer non-contiguous blocks of data, which is rarely supported by current existing libraries because these are tailored to make use of the limited DMA-style block transfer engines.

The concepts presented in this talk attempt to find more appropriate "network" services for high performance computing. As a method for achieving my performance goals I propose to separate the elementary functions (synchronization, data transfer, buffer management) and place them into the most appropriate component in our compiler oriented message passing architecture. The partition of communication related work among processor, co-processor (if present), network interface and the network is to be revisited to open up new opportunities of improved hardware support and better modularization into user-code, system-code and low level handlers. I see my work specialized to high performance computing with a global view so that I can modify the networking service to fit a parallel compilers needs rather than just to decompose a fixed, existing service differently.

SDI / LCS Seminar Questions?
Karen Lindenfelser, 86716, or visit www.pdl.cmu.edu/SDI/