This paper describes the design, implementation, and evaluation of a replication scheme to handle Byzantine faults in transaction processing database systems. The scheme compares ...
Ben Vandiver, Hari Balakrishnan, Barbara Liskov, S...
It is widely accepted that transient failures will appear more frequently in chips designed in the near future due to several factors such as the increased integration scale. On th...
As high performance clusters continue to grow in size, the mean time between failure shrinks. Thus, the issues of fault tolerance and reliability are becoming one of the challengi...
In recent years, exciting technological advances have been made in development of flexible electronics. These technologies offer the opportunity to weave computation, communicat...
Roozbeh Jafari, Foad Dabiri, Philip Brisk, Majid S...
Abstract. Current solutions for fault-tolerance in HPC systems focus on dealing with the result of a failure. However, most are unable to handle runtime system configuration change...