PARALLEL DATA LAB 

PDL Abstract

Ripple: Profile-Guided Instruction Cache Replacement for Data Center Applications

International Symposium on Computer Architecture (ISCA), June 2021.

Tanvir Ahmed Khan^, Dexin Zhang†, Akshitha Sriraman^*, Joseph Devietti‡, Gilles A Pokam§, Heiner Litz¶, Baris Kasikci^

*now with Carnegie Mellon University
^University of Michigan
†University of Science and Technology of China
‡University of Pennsylvania
§Intel Corporation
¶University of California, Santa Cruz

http://www.pdl.cmu.edu

Modern data center applications exhibit deep software stacks, resulting in large instruction footprints that frequently cause instruction cache misses degrading performance, cost, and energy efficiency. Although numerous mechanisms have been proposed to mitigate instruction cache misses, they still fall short of ideal cache behavior, and furthermore, introduce significant hardware overheads. We first investigate why existing I-cache miss mitigation mechanisms achieve sub-optimal performance for data center applications. We find that widely-studied instruction prefetchers fall short due to wasteful prefetch-induced cache line evictions that are not handled by existing replacement policies. Existing replacement policies are unable to mitigate wasteful evictions since they lack complete knowledge of a data center application’s complex program behavior.

To make existing replacement policies aware of these evictioninducing program behaviors, we propose Ripple, a novel softwareonly technique that profiles programs and uses program context to inform the underlying replacement policy about efficient replacement decisions. Ripple carefully identifies program contexts that lead to I-cache misses and sparingly injects “cache line eviction” instructions in suitable program locations at link time. We evaluate Ripple using nine popular data center applications and demonstrate that Ripple enables any replacement policy to achieve speedup that is closer to that of an ideal I-cache. Specifically, Ripple achieves an average performance improvement of 1.6% (up to 2.13%) over prior work due to a mean 19% (up to 28.6%) I-cache miss reduction.

FULL PAPER: pdf