PARALLEL DATA LAB 

PDL Abstract

MLtuner: System Support for Automatic Machine Learning Tuning

Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-16-108, October 2016.

Henggang Cui, Gregory R. Ganger, and Phillip B. Gibbons

Carnegie Mellon University

http://www.pdl.cmu.edu/

MLtuner automatically tunes settings for training tunables—such as the learning rate, the mini-batch size, and the data staleness bound—that have a significant impact on large-scale machine learning (ML) performance. Traditionally, these tunables are set manually, which is unsurprisingly error prone and difficult to do without extensive domain knowledge. MLtuner uses efficient snapshotting and optimization-guided online trial-and-error to find good initial settings as well as to re-tune settings during execution. Experiments with three real ML tasks show that MLtuner automatically enables performance within 40–178% of having oracle knowledge of the best settings, and outperforms oracle when no single set of settings are best for the entire execution. It also significantly outperforms most of the many feasible settings that might get used in practice.

KEYWORDS: Big Data infrastructure, Big Learning systems

FULL TR: pdf