Two schemes proposed to cope with unrecoverable or latent media errors and enhance the reliability of RAID systems are examined. The first scheme is the established, widely used d...
Ilias Iliadis, Robert Haas, Xiao-Yu Hu, Evangelos ...
To address the increasing susceptibility of commodity chip multiprocessors (CMPs) to transient faults, we propose Chiplevel Redundantly Threaded multiprocessor with Recovery (CRTR...
Mohamed A. Gomaa, Chad Scarbrough, Irith Pomeranz,...
For a space compactor, degradation of fault detection capability caused by the masking effects from unknown values is much more serious than that caused by error masking (i.e. ali...
Concentration of design effort for current single-chip Commercial-Off-The-Shelf (COTS) microprocessors has been directed towards performance. Reliability has not been the primary ...
We develop an availability solution, called SafetyNet, that uses a unified, lightweight checkpoint/recovery mechanism to support multiple long-latency fault detection schemes. At...
Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill, ...