DATE: Thursday, January 28, 2010
TIME: 12:00 pm - 1:00 pm
PLACE: Gates Center 8102

Vasanth (Vas) Bala, IBM Research

Virtual Machine Images as Structured Data: the Mirage project at IBM

As the adoption of hardware virtualization grows in the industry, the use of virtual machine images for checkpointing and distributing fully operational software environments is also increasing. The convenience they offer as an alternative to install packages (and tedious install instructions) for distributing software makes virtual machine images a potentially disruptive force in the multi-billion dollar software industry.

But virtual machine images also create new problems. They tend to be large, and clunky to move around. They proliferate rapidly (simply copy the image file). And there is a diversity of image disk formats and hypervisor platform idiosyncracies to support. The typical IT administrator's knee-jerk reaction to dealing with these problems is to place limitations on what anyone in their organization can do with images. And their only use case for virtual machine images is deploying them onto servers as quickly as possible. What a bore.

Mirage is a technology under development at IBM Research, that advocates a radically different approach. Think of a virtual machine image as a dormant machine, whose contents can be introspected and manipulated offline. This could change how machines are secured (virus scan offline images instead of online systems), how machines are updated (no more planned downtime for patches), and a whole lot of other things. In essence, it can enable many online IT maintenance processes to move offline - resulting in potentially huge cost savings. Mirage does this by indexing and storing the file contents within an image (treating it as structured file system data) in a way that enables very efficient offline operations, instead of the conventional disk representation (an unstructured collection of blocks, some of which do not even contain valid data).

This approach runs into a multitude of problems however. For example, most patches are designed to work on running systems (e.g. they may start processes), there is a diversity of file systems, and hypervisors execute disks not file systems. These are some the research problems that the Mirage project has been tackling. While Mirage is still a work in progress, it is already operational on a large scale within IBM as an Enterprise-wide, geographically distributed virtual machine image library used in the IBM Research Compute Cloud. It is also a key component of several new products announced by IBM aimed at Enterprise data center management solutions. We are also collaborating with several academic and research institutions to build a massive public-domain virtual machine image library. In this talk, I will present the motivation for Mirage, the current state of the system, and challenges ahead.

Vasanth (Vas) Bala is a senior member of the research staff at the IBM Thomas J Watson Research Center in New York, where he leads the Virtualization Runtime and Tools department. He is also responsible for the technology strategy related to the management of virtualized environments for the IBM Research division. Prior to IBM, Vas worked as a senior scientist at HP Labs and co-founded Liquid Machines, now a successful company in the Boston area.

SDI / LCS Seminar Questions?
Karen Lindenfelser, 86716, or visit