Memory Errors in Modern Systems: The Good, The Bad, and The Ugly

9 years 11 months ago

Download www.cs.virginia.edu

Several recent publications have shown that hardware faults in the memory subsystem are commonplace. These faults are predicted to become more frequent in future systems that contain orders of magnitude more DRAM and SRAM than found in current memory subsystems. These memory subsystems will need to provide resilience techniques to tolerate these faults when deployed in high-performance computing systems and data centers containing tens of thousands of nodes. Therefore, it is critical to understand the efﬁcacy of current hardware resilience techniques to determine whether they will be suitable for future systems. In this paper, we present a study of DRAM and SRAM faults and errors from the ﬁeld. We use data from two leadership-class high-performance computer systems to analyze the reliability impact of hardware resilience schemes that are deployed in current systems. Our study has several key ﬁndings about the efﬁcacy of many currently∗ A portion of this work was performed at...

Vilas Sridharan, Nathan DeBardeleben, Sean Blancha

Real-time Traffic

ASPLOS 2015 | Programming Languages |

claim paper

Post Info
More Details (n/a)

Added	16 Apr 2016
Updated	16 Apr 2016
Type	Journal
Year	2015
Where	ASPLOS
Authors	Vilas Sridharan, Nathan DeBardeleben, Sean Blanchard, Kurt B. Ferreira, Jon Stearley, John Shalf, Sudhanva Gurumurthi

Comments (0)

Sciweavers

Memory Errors in Modern Systems: The Good, The Bad, and The Ugly

ASPLOS 2015 | Programming Languages |

Explore & Download

Productivity Tools

Sciweavers