Thursday, November 8, 2007
12:00 pm - 1:00 pm
NOTE SPECIAL LOCATION - WEAN HALL 8220
Jay J. Wylie
Hewlett Packard Labs
Finding Fault Tolerant XOR-based Erasure Codes for Storage
XOR-based erasure codes have had a tremendous impact on networked systems in the recent past. The impact of such codes on clustered storage systems has not yet been felt. Replication and RAID continue to dominate clustered storage systems. We believe that a clear understanding of XOR-based erasure codes applicable to clustered storage systems, rather than networked systems, will facilitate their adoption in clustered storage systems.
Towards this end, we have identified a new fault tolerance metric for XOR-based erasure codes: the minimal erasures list (MEL). The MEL completely describes the fault tolerance of an XOR-based erasure code at and beyond its Hamming distance; it is therefore a useful metric for comparing the fault tolerance of such codes. We have also developed the ME Algorithm that efficiently determines the MEL of an erasure code. We have used the ME Algorithm (with some extensions) to find the most fault tolerant XOR-based erasure codes up to seven data symbols and seven parity symbols. These codes are directly applicable in clustered storage systems today.
Jay J. Wylie is a Research Scientist in the Storage Systems Department at Hewlett-Packard Labs. Jay's interests include erasure codes, distributed systems, storage systems, consistency protocols, (Byzantine) fault-tolerance, and dependability. Jay can be reached by email at email@example.com.
or visit http://www.pdl.cmu.edu/SDI/