PDL Abstract

Raising the Bar for Using GPUs in Software Packet Processing

12th USENIX Symposium on Networked Systems Design and Implementation (NSDI'15), March 16-18, 2015 Santa Clara, CA.

Anuj Kalia, Dong Zhou, Michael Kaminsky^, David G. Andersen

Carnegie Mellon University
^Intel Labs


Numerous recent research efforts have explored the use of Graphics Processing Units (GPUs) as accelerators for software-based routing and packet handling applications, typically demonstrating throughput several times higher than using legacy code on the CPU alone.

In this paper, we explore a new hypothesis about such designs: For many such applications, the benefits arise less from the GPU hardware itself as from the expression of the problem in a language such as CUDA or OpenCL that facilitates memory latency hiding and vectorization through massive concurrency. We demonstrate that in several cases, after applying a similar style of optimization to algorithm implementations, a CPU-only implementation is, in fact, more resource efficient than the version running on the GPU. To “raise the bar” for future uses of GPUs in packet processing applications, we present and evaluate a preliminary language/compiler-based framework called G-Opt that can accelerate CPU-based packet handling programs by automatically hiding memory access latency.