-- A hardware fault tolerance scheme for large multicomputers executing time-consuming non-interactive applications is described. Error detection and recovery are done mostly by so...
This paper describes a single-version algorithmic approach to design in fault tolerant computing in various computing systems by using static redundancy in order to mask transient...
An important step in the development of dependable systems is the validation of their fault tolerance properties. Fault injection has been widely used for this purpose, however wi...
— Large Clusters, high availability clusters and Grid deployments often suffer from network, node or operating system faults and thus require the use of fault tolerant programmin...
The Reusable Software Fault Tolerance Testbed ReSoFT was developed to facilitate the development and evaluation of high-assurance systems that require tolerance of both hardware...
Kam S. Tso, Eltefaat Shokri, Roger J. Dziegiel Jr.