—Fault tolerance (FT) has become a major concern in computing systems. Instruction duplication has been proposed to verify application execution at run time. Two techniques, instruction memoization and precomputation, have been shown to improve the performance and fault coverage of duplication. This work shows that the combination of these two techniques is much more powerful than either one in isolation. In addition to performance, it improves the long-lasting transient and permanent fault coverage upon the memoization scheme. Compared to the precomputation scheme, it reduces the longlasting transient and permanent fault coverage of 10.6% of the instructions, but covers 2.6 times as many instructions against shorter transient faults. On a system with 2 integer ALUs, the combined scheme reduces the performance degradation due to duplication by on average 27.3% and 22.2% compared to the precomputation and memoization-based techniques, respectively, with similar hardware requirements.
Demid Borodin, Ben H. H. Juurlink