CMOS technology trends are leading to an increasing incidence of hard (permanent) faults in processors. These faults may be introduced at fabrication or occur in the field. Whereas high-performance processor cores have enough redundancy to tolerate many of these faults, the simple, low-power cores that are attractive for multicore chips do not. We propose Detouring, a software-based scheme for tolerating hard faults in simple cores. The key idea is to automatically modify software such that its functionality is unchanged but it does not use any of the faulty hardware. Our initial implementation of Detouring tolerates hard faults in several hardware components, including the instruction cache, registers, functional units, and the operand bypass network. Detouring has no hardware cost and no performance overhead for fault-free cores.
Albert Meixner, Daniel J. Sorin