PARALLEL DATA LAB 

PDL Abstract

Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis

3rd ACM Symposium on Cloud Computing. October 14th-17th, 2012 - San Jose, CA. 2021 SoCC Test of Time Award!

Charles Reiss^, Alexey Tumanov, Gregory R. Ganger, Randy H. Katz^, Michael A. Kozuch*

Electrical and Computer Engineering
Carnegie Mellon University
Pittsburgh, PA 15213

* Intel Labs
^ University of California, Berkeley

http://www.pdl.cmu.edu/

To better understand the challenges in developing effective cloudbased resource schedulers, we analyze the first publicly available trace data from a sizable multi-purpose cluster. The most notable workload characteristic is heterogeneity: in resource types (e.g., cores:RAM per machine) and their usage (e.g., duration and resources needed). Such heterogeneity reduces the effectiveness of traditional slot- and core-based scheduling. Furthermore, some tasks are constrained as to the kind of machine types they can use, increasing the complexity of resource assignment and complicating task migration. The workload is also highly dynamic, varying over time and most workload features, and is driven by many short jobs that demand quick scheduling decisions. While few simplifying assumptions apply, we find that many longer-running jobs have relatively stable resource utilizations, which can help adaptive resource schedulers.

FULL PAPER: pdf