PARALLEL DATA LAB 

PDL Abstract

LOOKAHEAD CONVERGES TO STATIONARY POINTS OF SMOOTH NON-CONVEX FUNCTIONS

ICASSP 2020: 45th International Conference on Acoustics, Speech, and Signal Processing. Virtual Barcelona, Spain, May 4-8, 2020.

Jianyu Wang*^, Vinayak Tantiay^, Nicolas Ballasy^, Michael Rabbaty^

*Carnegie Mellon University
^Facebook AI Research

http://www.pdl.cmu.edu/

The Lookahead optimizer [Zhang et al., 2019] was recently proposed and demonstrated to improve performance of stochastic firstorder methods for training deep neural networks. Lookahead can be viewed as a two time-scale algorithm, where the fast dynamics (inner optimizer) determine a search direction and the slow dynamics (outer optimizer) perform updates by moving along this direction. We prove that, with appropriate choice of step-sizes, Lookahead converges to a stationary point of smooth non-convex functions. Although Lookahead is described and implemented as a serial algorithm, our analysis is based on viewing Lookahead as a multi-agent optimization method with two agents communicating periodically.

FULL PAPER: pdf