Parallel Data Laboratory

PDL Abstract

Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks

ASPLOS 2018. The 23rd ACM International Conference on Architectural Support for Programming Languages and Operating Systems, March 24th – March 28th, Williamsburg, VA.

Amirali Boroumand1 Saugata Ghose¹, Youngsok Kim², Rachata Ausavarungnirun¹, Eric Shiu³, Rahul Thakur³,
Daehyun Kim^4,3, Aki Kuusela³, Allan Knies³, Parthasarathy Ranganathan³, Onur Mutlu^5,1

1 Carnegie Mellon University
2 Dept. of ECE, Seoul National University
3 Google
4 Samsung Research
5 ETH Zürich

http://www.pdl.cmu.edu

We are experiencing an explosive growth in the number of consumer devices, including smartphones, tablets, web-based computers such as Chromebooks, and wearable devices. For this class of devices, energy efficiency is a first-class concern due to the limited battery capacity and thermal power budget. We find that data movement is a major contributor to the total system energy and execution time in consumer devices. The energy and performance costs of moving data between the memory system and the compute units are significantly higher than the costs of computation. As a result, addressing data movement is crucial for consumer devices.

In this work, we comprehensively analyze the energy and performance impact of data movement for several widely-used Google consumer workloads: (1) the Chrome web browser; (2) TensorFlow Mobile, Google’s machine learning framework; (3) video playback, and (4) video capture, both of which are used in many video services such as YouTube and Google Hangouts. We find that processing in-memory (PIM) can significantly reduce data movement for all of these workloads, by performing part of the computation close to memory. Each workload contains simple primitives and functions that contribute to a significant amount of the overall data movement. We investigate whether these primitives and functions are feasible to implement using PIM, given the limited area and power constraints of consumer devices. Our analysis shows that offloading these primitives to PIM logic, consisting of either simple cores or specialized accelerators, eliminates a large amount of data movement, and significantly reduces total system energy (by an average of 55.4% across the workloads) and execution time (by an average of 54.2%).

FULL PAPER: pdf

PARALLEL DATA LAB

PDL Publications

PDL Abstract

Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks

Contact us

Recent Events

PDL Retreat 2024

PDL Retreat 2023

PDL Retreat 2022

Social Media