DATE: Thursday, February 18, 2016
TIME: 12:00 pm - 1:00 pm
PLACE: RMCIC 4th Floor Panther Hollow Room

SPEAKER: Alexey Tumanov, CMU

TITLE: Taming Picky Jobs in a Rush: Heterogeneity and SLO-aware Cluster Scheduling

Aramid is a new cluster resource reservation system for datacenters with heterogeneous resources and interconnects, enabling predictable performance for deadline-driven production jobs and services sharing a cluster with best-effort jobs. To do so, Aramid combines a declarative language that explicitly describes job-specific heterogeneity preferences with novel reservation and scheduling algorithms. Specifically, Aramid's Heterogeneous Reservation Definition Language (HRDL) allows users to request time-varying capacity reservations across heterogeneous resource collections. But the dynamic nature of cluster environments creates a challenge. Estimated job run times are often inaccurate. Cluster machines may reboot and dynamically acquire new capabilities. Dynamic information on data locality and interference may become available. TetriSched is the new cluster scheduler that works in tandem with a resource reservation system to dynamically adapt to changing conditions. It continuously re-evaluates the cluster schedule, leverages jobs' time profiles and spatial constraints, and plans ahead to decide which jobs to defer to wait for preferred resources. It leverages significant flexibility afforded by declaratively captured space-time soft constraints and achieves significantly higher SLO attainment and cluster utilization.


Heterogeneity and SLO-aware resource reservations. In Submission to Hadoop Summit, June 28-30, 2016, San Jose, CA.

The TetriSched project.

Alexey Tumanov is a 6th year Ph.D. Candidate at Carnegie Mellon advised by Dr. Gregory Ganger. Prior to Carnegie Mellon, Alexey graduated with a research-based M.Sc. in Computer Science from York University and worked on para-virtualization technology at the University of Toronto as a full-time Research Assistant. His most recent research focused on support for static and dynamic heterogeneity, hard and soft placement constraints, time-varying resource capacity guarantees, and combinatorial constraints in heterogeneous datacenters at scale.

Karen Lindenfelser, 86716, or visit