PARALLEL DATA LAB 

PDL Abstract

Near-Real-Time Inference of File-Level Mutations from Virtual Disk Writes

Carnegie Mellon University School of Computer Science Technical Report CMU-CS-12-103. February 2012.

Wolfgang Richter, Mahadev Satyanarayanan, Jan Harkes, Benjamin Gilbert

School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213

http://www.pdl.cmu.edu/

We describe a new mechanism for cloud computing enabling near-real-time monitoring of virtual disk write streams across an entire cloud. Our solution has low IO overhead for the guest VM, low latency to file-level mutation notification, and a layered design for scalability. We achieve low IO overhead by duplicating the virtual disk write stream as it passes through a managing VMM. We achieve low latency by performing semantic inference at as high a level as possible–file-level. We achieve cloud scale by layering our design allowing filtering of file-level mutations by each layer such that network traffic to centralized monitoring infrastructure is minimized. We assume this technique is used on pre-indexed virtual disks, most likely derived from a cooperating VM image library such as those used in clouds today. Our new cloud primitive enables system administration tasks that involve monitoring files–virus scanning, log file parsing, etc.–to be performed outside of the running VM instance, either on the VMM host, or shipped to a central monitoring agent.

KEYWORDS: Block write, cloud, cloud computing, file-level, inference, introspection, kernel virtual machine, KVM, monitoring, near-real-time, real-time, semantic, virtual disk, virtual disk write, virtual machine, VM, virtual machine introspection, VMI, virtual machine monitor, VMM

FULL TR: pdf