PARALLEL DATA LAB 

PDL Abstract

Design Tradeoffs in Applying Content Addressable Storage to Enterprise-scale Systems Based on Virtual Machines

Proceedings of the 2006 USENIX Annual Technical Conference (USENIX '06), Boston, Massachusetts, May-June 2006.

Partho Nath†, Michael A. Kozuch*, David R. O’Hallaron‡, Jan Harkes‡, M. Satyanarayanan‡,
Niraj Tolia‡, Matt Toups‡

†Penn State University,
*Intel Research Pittsburgh
‡Carnegie Mellon University

http://www.pdl.cmu.edu/

This paper analyzes the usage data from a live deployment of an enterprise client management system based on virtual machine (VM) technology. Over a period of seven months, twenty-three volunteers used VM-based computing environments hosted by the system and created over 800 checkpoints of VM state, where each checkpoint included the virtual memory and disk states. Using this data, we study the design tradeoffs in applying content addressable storage (CAS) to such VM-based systems. In particular, we explore the impact on storage requirements and network load of different privacy properties and data granularities in the design of the underlying CAS system. The study clearly demonstrates that relaxing privacy can reduce the resource requirements of the system, and identifies designs that provide reasonable compromises between privacy and resource demands.

FULL PAPER: pdf