DATE: Wednesday, October 23, 2013
TIME: Noon - 1:00 pm
PLACE: CIC - 4th floor (ISTC Panther Hollow Room)

SPEAKER: Ethan Miller, UCSC and Pure Storage

TITLE: Inside the Pure Storage Flash Array: Building a High Performance, Data Reducing Storage System from Commodity SSDs

ABSTRACT:
The storage industry is currently in the midst of a flash revolution. Today's smartphones, cameras, and many laptops all use flash storage, but the $30 billion a year enterprise storage market is still dominated by spinning disk. Flash has large advantages in speed and power consumption, but its disadvantages (price, limited overwrites, large erase block size) have prevented it from being a drop-in replacement for disk in a storage array. This talk will describe the techniques that we've developed at Pure Storage to overcome these obstacles in creating a high-performance flash storage array using commodity SSDs.

We will first explain what an enterprise storage array is and how it's used. We then describe the design of the Pure FlashArray, an enterprise storage array built from the ground up from relatively inexpensive consumer flash storage. The array and its software, Purity, leverage the advantages of flash while minimizing the downsides. Purity performs all writes to flash in multiples of the erase block size, and keeps data in a key-value store that persists approximate answers to further reduce writes at the cost of extra (cheap) reads. Our key-value store, which includes a key range invalidation table, provides other advantages, such as the ability to take nearly instantaneous, zero-overhead snapshots and the ability to bound the size of our metadata structures despite using monotonically-increasing unique identifiers for many purposes. Purity also reduces the amount of user data stored on flash through a range of techniques, including compression, deduplication, and thin provisioning. The system relies upon RAID both for reliability and for performance consistency: by avoiding reads to devices that are being written, we ensure more efficient writes and eliminate long-latency reads. The net result is a flash array that delivers sustained read-write performance of over 400,000 4KB I/O requests per second while maintaining uniform sub-millisecond latency and providing an average data reduction rate in excess of 6x, averaged across installed systems.

FURTHER READING:
SILT: A Memory-Efficient, High-Performance Key-Value Store. Hyeontaek Lim, Bin Fan, David Andersen and Michael Kaminsky. ACM Symposium on Operating Systems Principles (SOSP'11), Cascais, Portugal, October 2011.

BIO:
Ethan L. Miller is a Professor of Computer Science at the University of California, Santa Cruz, where he is the Director of the NSF I/UCRC Center for Research in Storage Systems (CRSS) and Associate Director of the Storage Systems Research Center (SSRC). He received his ScB from Brown in 1987 and his PhD from UC Berkeley in 1995, and has been on the UC Santa Cruz faculty since 2000. He has written over 120 papers covering topics such as archival storage, file systems for high-end computing, metadata and information retrieval, file systems performance, secure file systems, and distributed systems. He was a member of the team that developed Ceph, a scalable high-performance distributed file system for scientific computing that is now being adopted by several high-end computing organizations. His work on reliability and security for scalable and distributed storage is also widely recognized, as is his work on secure, efficient long-term archival storage and scalable metadata systems.

His current research projects, which are funded by the National Science Foundation, Department of Energy, and industry support for the CRSS and SSRC, include long-term archival storage systems, scalable metadata and indexing structures, high performance petabyte-scale storage systems, and file systems for non-volatile memory technologies. Prof. Miller's broader interests include file systems, parallel and distributed systems, operating systems, and computer security. In addition to research and teaching in storage systems and operating systems, Prof. Miller is currently working with Pure Storage to help bring affordable all-flash storage based on commodity SSDs to the enterprise.

VISITOR HOST: Garth Gibson
VISITOR COORDINATOR: Jennifer Landefeld (jennsbl@cs.cmu.edu)

SDI / ISTC SEMINAR QUESTIONS?
Karen Lindenfelser, 86716, or visit www.pdl.cmu.edu/SDI/