PARALLEL DATA LAB 

PDL Abstract

Cooperative SGD: A Unified Framework for the Analysis of Local-Update SGD

Journal of Machine Learning Research (JMLR), 2021. September 2021.

Jianyu Wang, Gauri Joshi

Carnegie Mellon University

http://www.pdl.cmu.edu/

When training machine learning models using stochastic gradient descent (SGD) with a large number of nodes or massive edge devices, the communication cost of synchronizing gradients at every iteration is a key bottleneck that limits the scalability of the system and hinders the benefit of parallel computation. Local-update SGD algorithms, where worker nodes perform local iterations of SGD and periodically synchronize their local models, can e ectively reduce the communication frequency and save the communication delay. In this paper, we propose a powerful framework, named Cooperative SGD, that subsumes a variety of local-update SGD algorithms (such as local SGD, elastic averaging SGD, and decentralized parallel SGD) and provides a unified convergence analysis. Notably, special cases of the unified convergence analysis provided by the cooperative SGD framework yield 1) the first convergence analysis of elastic averaging SGD for general non-convex objectives, and 2) improvements upon previous analyses of local SGD and decentralized parallel SGD. Moreover, we design new algorithms such as elastic averaging SGD with overlapped computation and communication, and decentralized periodic averaging which are shown to be 4x or more faster than the baseline in reaching the same training loss.

Keywords: Communication-efficient training, distributed SGD with local updates, distributed optimization, federated learning, convergence analysis

FULL PAPER: pdf