PARALLEL DATA LAB 

PDL Abstract

Error Analysis and Retention-Aware Error Management for NAND Flash Memory

Intel Technology Journal (ITJ) Special. Issue on Memory Resiliency, 2013.

Yu Cai, Gulay Yalcin*, Onur Mutlu, Erich F. Haratsch^, Adrian Cristal*, Osman Unsal*, Ken Mai

Carnegie Mellon University
5000 Forbes Ave.
Pittsburgh, PA 15213

*Barcelona Supercomputing Center
^LSI Corporation

http://www.pdl.cmu.edu/

With continued scaling of NAND flash memory process technology and multiple bits programmed per cell, NAND flash reliability and endurance are degrading. In our research, we experimentally measure, characterize, analyze, and model error patterns in nanoscale flash memories. Based on the understanding developed using real flash memory chips, we design techniques for more efficient and effective error management than traditionally used costly error correction codes.

In this article, we summarize our major error characterization results and mitigation techniques for NAND flash memory. We first provide a characterization of errors that occur in 30- to 40-nm flash memories, showing that retention errors, caused due to flash cells leaking charge over time, are the dominant source of errors. Second, we describe retention-aware error management techniques that aim to mitigate retention errors. The key idea is to periodically read, correct, and reprogram (in-place) or remap the stored data before it accumulates more retention errors than can be corrected by simple ECC. Third, we briefly touch upon our recent work that characterizes the distribution of the threshold voltages across different cells in a modern 20- to 24-nm flash memory, with the hope that such a characterization can enable the design of more effective and efficient error correction mechanisms to combat threshold voltage distortions that cause various errors. We conclude with a brief description of our ongoing related work in combating scaling challenges of both NAND flash memory and DRAM memory.

FULL PAPER: pdf