PDL Abstract

Peering through the Dark: An Owl’s View of Inter-job Dependencies and Jobs’ Impact in Shared Clusters

SIGMOD ’19, June 30–July 5, 2019, Amsterdam, Netherlands.

Andrew Chung, Carlo Curino†, Subru Krishnan†, Konstantinos Karanasos†, Panagiotis Garefalakis*, Gregory R. Ganger

Carnegie Mellon University
† Microsoft
* Imperial College London

Shared multi-tenant infrastructures have enabled companies to consolidate workloads and data, increasing datasharing and cross-organizational re-use of job outputs. This same resource- and work-sharing has also increased the risk of missed deadlines and diverging priorities as recurring jobs and workflows developed by different teams evolve independently. To prevent incidental business disruptions, identifying and managing job dependencies with clarity becomes increasingly important. Owl is a cluster log analysis and visualization tool that (i) extracts and visualizes job dependencies derived from historical job telemetry and data provenance data sets, and (ii) introduces a novel job valuation algorithm estimating the impact of a job on dependent users and jobs. This demonstration showcases Owl’s features that can help users identify critical job dependencies and quantify job importance based on jobs’ impact.