PDL Abstract

JamaisVu: Robust Scheduling with Auto-Estimated Job Runtimes

Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-16-104. September 2016.

Alexey Tumanov*, Angela Jiang*, Jun Woo Park*, Michael A. Kozuch†, Gregory R. Ganger*

* Carnegie Mellon University
† Intel Labs

JamaisVu is a new end-to-end cluster scheduling system that automatically generates and robustly exploits job runtime predictions. Using runtime knowledge allows it to more effectively pack jobs with diverse time concerns (e.g., deadline vs. latency) and soft placement constraints on heterogeneous cluster resources. JamaisVu’s job run time predictor, JVuPredict uses a new black-box approach that tracks job run time history as a function of multiple job submission features (e.g., user ID and program name), and then adaptively uses the most effective feature subset for each submitted job. Analysis of a 1-month Google cluster trace shows JVuPredict predicts reasonably well for complex real-world job mixes; for example, 90% of predictions are within a factor of two of actual runtime. But, because predictions cannot be perfect, JamaisVu includes new techniques for mitigating the effects of such real misprediction profiles. Experiments with workloads derived from the trace show that JamaisVu performs nearly as well as a hypothetical scheduler with perfect job runtime information, outperforming runtime-unaware scheduling by reducing SLO miss rate, increasing goodput, and maintaining comparable latency for best effort jobs.

KEYWORDS: cluster scheduling, cloud systems

FULL TR: pdf