The paper describes a metaobject architecture for distributed fault tolerant systems. Basically metaobject protocols enables functional objects to be independent from meta-function...
-- A hardware fault tolerance scheme for large multicomputers executing time-consuming non-interactive applications is described. Error detection and recovery are done mostly by so...
Hypervisor-based fault tolerance (HBFT), a checkpoint-recovery mechanism, is an emerging approach to sustaining mission-critical applications. Based on virtualization technology, H...
Jun Zhu, Wei Dong, Zhefu Jiang, Xiaogang Shi, Zhen...
Checkpoint/restart is a general idea for which particular implementations enable various functionalities in computer systems, including process migration, gang scheduling, hiberna...
Fault tolerance is one of the key issues for large scale applications executed on high performance computing systems. In a cluster federation, clusters are gathered to provide hug...