PARALLEL DATA LAB 

PDL Abstract

Using Utility Functions to Control a Distributed Storage System

Carnegie Mellon University, Department of Electrical and Computer Engineering Ph.D. Dissertation,
CMU-PDL-08-102. May 2008.

John D. Strunk

Electrical and Computer Engineering
Carnegie Mellon University
Pittsburgh, PA 15213

http://www.pdl.cmu.edu

Provisioning, and later optimizing, a storage system involves an extensive set of trade-offs between
system metrics, including purchase cost, performance, reliability, availability, and power. Previous
work has tried to simplify provisioning and tuning tasks by allowing a system administrator to
specify goals for various storage metrics. While this helps by raising the level of specification from
low-level mechanisms to high-level storage system metrics, it does not permit trade-offs between
those metrics.

This dissertation goes beyond goal-based requirements by allowing the system administrator to
use a utility function to specify his objectives. Using utility, both the costs and benefits of configuration
and tuning decisions can be examined within a single framework. This permits a provisioning
system to make automated trade-offs across system metrics, such as performance, data protection
and power consumption. It also allows an automated optimization system to properly balance the
cost of data migration with its expected benefits.

This work develops a prototype storage provisioning tool that uses an administrator-specified
utility function to generate cost-effective storage configurations. The tool is then used to provide
examples of how utility can be used to balance competing objectives (e.g., performance and data
protection) and to provide guidance in the presence of external constraints. A framework for using
utility to evaluate data migration is also developed. This framework balances data migration costs
(decreases to current system metrics) against the potential benefits by discounting future expected
utility. Experiments show that, by looking at utility over time, it is possible to choose the migration
speed as well as weigh alternate optimization choices to provide the proper balance of current and
future levels of service.

FULLTHESIS: pdf