DATE: Thursday, October 11, 2012
TIME: Noon - 1:00 pm
PLACE: GHC 8102

SPEAKER: Healfdene Goguen, Google

TITLE: Network-Accessible Disks

ABSTRACT:
This talk presents D, Google's network-based storage node. D provides a simple, non-Posix file API and low-level file system used by Google's distributed file system, Colossus, and by big-data systems such as MapReduce. Drawing on the lessons learned from the original Google File System chunk servers, D was designed to handle component failures transparently and to favor large files, and was tuned for common application access patterns such as append-only writes. D also incorporates a novel performance isolation system that allows latency-sensitive applications such as Gmail to share the same disks as big-data batch processing workloads like MapReduce, with batch workload I/O resulting in a predictable, bounded increase in the tail latency of time-critical requests. D incorporates efficiency gains over the GFS chunkserver, lowering the cost of storage. Today, over 75% of Google production data is stored in D and managed by either Colossus, or directly from a small set of systems like MapReduce which do not use a full-scale distributed file system.

BIO:
Since 2011, Healfdene Goguen has been the technical lead of Google's D storage project. Healfdene has also worked on Google's cluster management infrastructure and Bigtable. Before joining Google, Healfdene worked on voice-over-IP at AT&T Labs from 1999 to 2005. He received his undergraduate degree from CMU in 1988 and his PhD, in type theory, from the University of Edinburgh in 1994. He held postdoctoral positions at INRIA Sophia-Antipolis and the University of Edinburgh.

HOSTS:
Greg Ganger

VISITOR COORDINATOR: Karen Lindenfelser

SDI / ISTC SEMINAR QUESTIONS?
Karen Lindenfelser, 86716, or visit www.pdl.cmu.edu/SDI/