The RAIN (Reliable Array of Independent Nodes) project at Caltech is focusing on creating highly reliable distributed systems by leveraging commercially available personal compute...
Paul S. LeMahieu, Vasken Bohossian, Jehoshua Bruck
This paper presents a novel circuit fault detection and isolation technique for quasi delay-insensitive asynchronous circuits. We achieve fault isolation by a combination of physi...
Checkpoint/restart is a general idea for which particular implementations enable various functionalities in computer systems, including process migration, gang scheduling, hiberna...
Hypervisor-based fault tolerance (HBFT), a checkpoint-recovery mechanism, is an emerging approach to sustaining mission-critical applications. Based on virtualization technology, H...
Jun Zhu, Wei Dong, Zhefu Jiang, Xiaogang Shi, Zhen...
The proposed software technique is a very low cost and an effective solution towards designing Byzantine fault tolerant computing application systems that are not so safety critic...