Sciweavers

CGO
2009
IEEE

ESoftCheck: Removal of Non-vital Checks for Fault Tolerance

14 years 6 months ago
ESoftCheck: Removal of Non-vital Checks for Fault Tolerance
—As semiconductor technology scales into the deep submicron regime the occurrence of transient or soft errors will increase. This will require new approaches to error detection. Software checking approaches are attractive because they require little hardware modification and can be easily adjusted to fit different reliability and performance requirements. Unfortunately, software checking adds a significant performance overhead. In this paper we present ESoftCheck, a set of compiler optimization techniques to determine which are the vital checks, that is, the minimum number of checks that are necessary to detect an error and roll back to a correct program state. ESoftCheck identifies the vital checks on platforms where registers are hardware-protected with parity or ECC, when there are redundant checks and when checks appear in loops. ESoftCheck also provides knobs to trade reliability for performance based on the support for recovery and the degree of trustiness of the operations...
Jing Yu, María Jesús Garzarán
Added 18 May 2010
Updated 18 May 2010
Type Conference
Year 2009
Where CGO
Authors Jing Yu, María Jesús Garzarán, Marc Snir
Comments (0)