Sciweavers

76 search results - page 11 / 16
» A Software-Based Hardware Fault Tolerance Scheme for Multico...
Sort
View
SIGMETRICS
2008
ACM
121views Hardware» more  SIGMETRICS 2008»
13 years 7 months ago
Disk scrubbing versus intra-disk redundancy for high-reliability raid storage systems
Two schemes proposed to cope with unrecoverable or latent media errors and enhance the reliability of RAID systems are examined. The first scheme is the established, widely used d...
Ilias Iliadis, Robert Haas, Xiao-Yu Hu, Evangelos ...
ISCA
2003
IEEE
136views Hardware» more  ISCA 2003»
14 years 1 months ago
Transient-Fault Recovery for Chip Multiprocessors
To address the increasing susceptibility of commodity chip multiprocessors (CMPs) to transient faults, we propose Chiplevel Redundantly Threaded multiprocessor with Recovery (CRTR...
Mohamed A. Gomaa, Chad Scarbrough, Irith Pomeranz,...
DAC
2006
ACM
14 years 9 months ago
Unknown-tolerance analysis and test-quality control for test response compaction using space compactors
For a space compactor, degradation of fault detection capability caused by the masking effects from unknown values is much more serious than that caused by error masking (i.e. ali...
Mango Chia-Tso Chao, Kwang-Ting Cheng, Seongmoon W...
DSN
2002
IEEE
14 years 26 days ago
Ditto Processor
Concentration of design effort for current single-chip Commercial-Off-The-Shelf (COTS) microprocessors has been directed towards performance. Reliability has not been the primary ...
Shih-Chang Lai, Shih-Lien Lu, Jih-Kwon Peir
ISCA
2002
IEEE
115views Hardware» more  ISCA 2002»
14 years 25 days ago
SafetyNet: Improving the Availability of Shared Memory Multiprocessors with Global Checkpoint/Recovery
We develop an availability solution, called SafetyNet, that uses a unified, lightweight checkpoint/recovery mechanism to support multiple long-latency fault detection schemes. At...
Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill, ...