PARALLEL DATA LAB 

PDL Abstract

Solving the Straggler Problem with Bounded Staleness

14th USENIX HotOS Workshop, Santa Ana Pueblo, NM, May 13-15, 2013.

James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth A. Gibson,
Kimberly Keeton*, Eric Xing

Parallel Data Laboratory
School of Computer Science & Electrical and Computer Engineering
Carnegie Mellon University
Pittsburgh, PA 15213

*HP Labs

http://www.pdl.cmu.edu/

Many important applications fall into the broad class of iterative convergent algorithms. Parallel implementation of these algorithms are naturally expressed using the Bulk Synchronous Parallel (BSP) model of computation. However, implementations using BSP are plagued by the straggler problem, where every transient slowdown of any given thread can delay all other threads. This paper presents the Stale Synchronous Parallel (SSP) model as a generalization of BSP that preserves many of its advantages, while avoiding the straggler problem. Algorithms using SSP can execute efficiently, even with significant delays in some threads, addressing the oft-faced straggler problem.

FULL TR: pdf