This paper describes the design, implementation, and evaluation of a replication scheme to handle Byzantine faults in transaction processing database systems. The scheme compares ...
Ben Vandiver, Hari Balakrishnan, Barbara Liskov, S...
The design of safety-critical systems has typically adopted static techniques to simplify error detection and fault tolerance. However, economic pressure to reduce costs is exposi...
A case study of performance and dependability evaluation of fault-tolerant multiprocessors is presented. Two specific architectures are analyzed taking into account system functio...
The Reusable Software Fault Tolerance Testbed ReSoFT was developed to facilitate the development and evaluation of high-assurance systems that require tolerance of both hardware...
Kam S. Tso, Eltefaat Shokri, Roger J. Dziegiel Jr.
The productivity of HPC system is determined not only by their performance, but also by their reliability. The conventional method to limit the impact of failures is checkpointing...