PARALLEL DATA LAB 

PDL Abstract

A Case for Hierarchical Rings with Deflection Routing: An energy-efficient on-chip communication substrate

Parallel Computing, Volume 54, May 2016, Pages 29-45, ISSN 0167-8191.

Rachata Ausavarungniruna, Chris Fallina, Xiangyao Yub, Kevin Kai-Wei Changa, Greg Nazarioa, Reetuparna Dasc, GabrielH. Lohd, Onur Mutlua

a Carnegie Mellon University, United States
b University of Michigan, United States
c Massachusetts Institute of Technology, United States
d Advanced Micro Devices, United States

http://www.pdl.cmu.edu/

Hierarchical ring networks, which hierarchically connect multiple levels of rings, have been proposed in the past to improve the scalability of ring interconnects, but past hierarchical ring designs sacrifice some of the key benefits of rings by reintroducing more complex in-ring buffering and buffered flow control. Our goal in this paper is to design a new hierarchical ring interconnect that can maintain most of the simplicity of traditional ring designs (i.e., no in-ring buffering or buffered flow control) while achieving high scalability as more complex buffered hierarchical ring designs.

To this end, we revisit the concept of a hierarchical-ring network-on-chip. Our design, called HiRD (Hierarchical Rings with Deflection), includes critical features that enable us to mostly maintain the simplicity of traditional simple ring topologies while providing higher energy efficiency and scalability. First, HiRD does not have any buffering or buffered flow control within individual rings, and requires only a small amount of buffering between the ring hierarchy levels. When inter-ring buffers are full, our design simply deflects flits so that they circle the ring and try again, which eliminates the need for in-ring buffering. Second, we introduce two simple mechanisms that together provide an end-to-end delivery guarantee within the entire network (despite any deflections that occur) without impacting the critical path or latency of the vast majority of network traffic.

Our experimental evaluations on a wide variety of multiprogrammed and multithreaded workloads and synthetic traffic patterns show that HiRD attains equal or better performance at better energy efficiency than multiple versions of both a previous hierarchical ring design and a traditional single ring design. We also extensively analyze our design’s characteristics and injection and delivery guarantees. We conclude that HiRD can be a compelling design point that allows higher energy efficiency and scalability while retaining the simplicity and appeal of conventional ring-based designs.

FULL PAPER: pdf