PDL Abstract

Tributary: Spot-dancing for Elastic Services with Latency SLOs

2018 USENIX Annual Technical Conference. July 11–13, 2018 Boston, MA, USA. Supersedes Carnagie Mellon University Parallel Data Lab Technical Report CMU-PDL-18-102.

Aaron Harlap§, Andrew Chung§, Alexey Tumanov†, Gregory R. Ganger§, Phillip B. Gibbons§

§ Carnegie Mellon University
† UC Berkeley

The Tributary elastic control system embraces the uncertain nature of transient cloud resources, such as AWS spot instances, to manage elastic services with latency SLOs more robustly and more cost-effectively. Such resources are available at lower cost, but with the proviso that they can be preempted en masse, making them risky to rely upon for business-critical services. Tributary creates models of preemption likelihood and exploits the partial independence among different resource offerings, selecting collections of resource allocations that satisfy SLO requirements and adjusting them over time, as client workloads change. Although Tributary’s collections are often larger than required in the absence of preemptions, they are cheaper because of both lower spot costs and partial refunds for preempted resources. At the same time, the often-larger sets allow unexpected workload bursts to be absorbed without SLO violation. Over a range of web service workloads, we find that Tributary reduces cost for achieving a given SLO by 81–86% compared to traditional scaling on non-preemptible resources, and by 47–62% compared to the high-risk approach of the same scaling with spot resources.