Operating system lockup errors can render a computer unusable by preventing the execution other programs. Watchdog timers can be used to recover from a lockup by resetting the pro...
Francis M. David, Jeffrey C. Carlyle, Roy H. Campb...
- Error detection plays an important role in fault-tolerant computer systems. Two primary parameters concerned for error detection are the latency and coverage. In this paper, a ne...
Modern computer systems are becoming more powerful and are using larger memories. However, except for very high end systems, little attention is being paid to high availability. T...
DeQing Chen, Alan Messer, Philippe Bernadat, Guang...
Software system faults are often caused by unexpected interactions among components. Yet the size of a test suite required to test all possible combinations of interactions can be...
Myra B. Cohen, Peter B. Gibbons, Warwick B. Mugrid...
Due to reduction in device feature size and supply voltage, the sensitivity to radiation induced transient faults (soft errors) of digital systems increases dramatically. Intensiv...