PDL Abstract

Agility and Performance in Elastic Distributed Storage

ACM Transactions on Storage, Vol. 10, No. 4, Article 16, October 2014.

Lianghong Xu, James Cipar, Elie Krevat, Alexey Tumanov, And Nitin Gupta,
Michael A. Kozuch*, Gregory R. Ganger

Electrical and Computer Engineering
Carnegie Mellon University
Pittsburgh, PA 15213

* Intel Labs


Elastic storage systems can be expanded or contracted to meet current demand, allowing servers to be turned off or used for other tasks. However, the usefulness of an elastic distributed storage system is limited by its agility: how quickly it can increase or decrease its number of servers. Due to the large amount of data they must migrate during elastic resizing, state of the art designs usually have to make painful trade-offs among performance, elasticity, and agility. This article describes the state of the art in elastic storage and a new system, called SpringFS, that can quickly change its number of active servers, while retaining elasticity and performance goals. SpringFS uses a novel technique, termed bounded write offloading, that restricts the set of servers where writes to overloaded servers are redirected.

This technique, combined with the read offloading and passive migration policies used in SpringFS, minimizes the work needed before deactivation or activation of servers. Analysis of real-world traces from Hadoop deployments at Facebook and various Cloudera customers and experiments with the SpringFS prototype confirm SpringFS’s agility, show that it reduces the amount of data migrated for elastic resizing by up to two orders of magnitude, and show that it cuts the percentage of active servers required by 67–82%, outdoing state-of-the-art designs by 6–120%.

KEYWORDS: Cloud storage, distributed file systems, elastic storage, agility, power, write offloading