SDI Seminar

Speaker: Todd C. Mowry, Carnegie Mellon University

Date: February 3, 2000
Time: Noon
Place: Wean Hall 8220

Software-Controlled Multithreading Using Informing Memory Operations


To help tolerate the latency of accessing remote data in a shared-memory multiprocessor, we explore a novel approach to switch-on-miss multithreading that is software-controlled rather than hardware-controlled. Our technique uses informing memory operations to trigger the thread switches with sufficiently low overhead that we observe speedups of 10% or more for four out of seven applications, with one application speeding up by 14%. By selectively applying register partitioning to reduce thread switching overhead, we can achieve further gains: e.g., an overall speedup of 23% for FFT. Although this software-controlled approach does not match the performance of hardware-controlled schemes on multithreaded workloads, it requires substantially less hardware support than previous schemes and
is not likely to degrade single-thread performance. As remote memory accesses continue to become more expensive relative to software overheads, we expect software-controlled multithreading to become increasingly attractive in the future.

Todd C. Mowry is an Associate Professor in the School of Computer Science at Carnegie Mellon University. He received an M.S.E.E. and Ph.D. from Stanford University in 1989 and 1994, respectively. From 1994 through 1997, he was an Assistant Professor in the ECE and CS departments at the University of Toronto prior to joining Carnegie Mellon University in July, 1997. The goal of Professor Mowry's research is to develop new techniques for designing computer systems (both hardware and software) such that they can achieve dramatic performance breakthroughs at low cost without placing any additional burden on the programmer. Specifically, he has been focusing on two areas: (i) automatically tolerating the ever-increasing relative latencies of accessing and communicating data (via DRAM, disks, and networks) which threaten to nullify any other improvements in processing efficiency; and (ii) automatically extracting thread-level parallelism from important classes of applications where this is currently not possible.