Sciweavers

TC
2010

PERFECTORY: A Fault-Tolerant Directory Memory Architecture

13 years 9 months ago
PERFECTORY: A Fault-Tolerant Directory Memory Architecture
—The number of CPUs in chip multiprocessors is growing at the Moore’s Law rate, due to continued technology advances. However, new technologies pose serious reliability challenges, such as more frequent occurrences of degraded or even nonoperational devices, and they threaten the cost-effectiveness and dependability of future computing systems. This work studies how to protect the on-chip coherence directory from fault occurrences. In a chip multiprocessor, cache coherence mechanisms such as directory memory are critical for offering consistent data view to all CPUs. We propose a novel online fault detection and correction scheme to enhance yield and resilience to runtime errors at a small performance cost. The proposed scheme uses smart encoding and coherence protocol adaptation strategies to salvage faulty directory entries. We also develop an online error recovery scheme that protects the directory memory from soft errors. We call our fault-tolerant directory memory architecture...
Hyunjin Lee, Sangyeun Cho, Bruce R. Childers
Added 30 Jan 2011
Updated 30 Jan 2011
Type Journal
Year 2010
Where TC
Authors Hyunjin Lee, Sangyeun Cho, Bruce R. Childers
Comments (0)