Sciweavers

ISCA
2012
IEEE

LOT-ECC: Localized and tiered reliability mechanisms for commodity memory systems

12 years 3 months ago
LOT-ECC: Localized and tiered reliability mechanisms for commodity memory systems
Memory system reliability is a serious and growing concern in modern servers. Existing chipkill-level memory protection mechanisms suffer from several drawbacks. They activate a large number of chips on every memory access – this increases energy consumption, and reduces performance due to the reduction in rank-level parallelism. Additionally, they increase access granularity, resulting in wasted bandwidth in the absence of sufficient access locality. They also restrict systems to use narrow-I/O x4 devices, which are known to be less energy-efficient than the wider x8 DRAM devices. In this paper, we present LOT-ECC, a localized and multi-tiered protection scheme that attempts to solve these problems. We separate error detection and error correction functionality, and employ simple checksum and parity codes effectively to provide strong fault-tolerance, while simultaneously simplifying implementation. Data and codes are localized to the same DRAM row to improve access efficiency. ...
Aniruddha N. Udipi, Naveen Muralimanohar, Rajeev B
Added 28 Sep 2012
Updated 28 Sep 2012
Type Journal
Year 2012
Where ISCA
Authors Aniruddha N. Udipi, Naveen Muralimanohar, Rajeev Balasubramonian, Al Davis, Norman P. Jouppi
Comments (0)