PARALLEL DATA LAB 

PDL Abstract

Efficient Data Mapping and Buffering Techniques for Multilevel Cell Phase-Change Memories

ACM Transactions on Architecture and Code Optimization, Vol. 11, No. 4, Article 40, December 2014.

Hanbin Yoon, Justin Meza, Naveen Mural Imanohar^, Norman P. Jouppi*, Onur Mutlu

Carnegie Mellon University
^ Hewlett-Packard Labs
* Google Inc.

http://www.pdl.cmu.edu/

New phase-change memory (PCM) devices have low-access latencies (like DRAM) and high capacities (i.e., low cost per bit, like Flash). In addition to being able to scale to smaller cell sizes than DRAM, a PCM cell can also store multiple bits per cell (referred to as multilevel cell, or MLC), enabling even greater capacity per bit. However, reading and writing the different bits of data from and to an MLC PCM cell requires different amounts of time: one bit is read or written first, followed by another. Due to this asymmetric access process, the bits in an MLC PCM cell have different access latency and energy depending on which bit in the cell is being read or written.

We leverage this observation to design a new way to store and buffer data in MLC PCM devices. While traditional devices couple the bits in each cell next to one another in the address space, our key idea is to logically decouple the bits in each cell into two separate regions depending on their read/write characteristics: fast-read/slow-write bits and slow-read/fast-write bits. We propose a low-overhead hardware/software technique to predict and map data that would benefit from being in each region at runtime. In addition, we show how MLC bit decoupling provides more flexibility in the way data is buffered in the device, enabling more efficient use of existing device buffer space.

Our evaluations for a multicore system show that MLC bit decoupling improves system performance by 19.2%, memory energy efficiency by 14.4%, and thread fairness by 19.3% over a state-of-the-art MLC PCM system that couples the bits in its cells.We show that our results are consistent across a variety of workloads and system configurations.

FULL PAPER: pdf