Assigning an application’s fault-tolerance properties (e.g., replication style, checkpointing frequency) statically, and in an arbitrary manner, can lead to the application not ...
In this paper, we present a new fault tolerance system called DejaVu for transparent and automatic checkpointing, migration, and recovery of parallel and distributed applications....
Joseph F. Ruscio, Michael A. Heffner, Srinidhi Var...
Hypervisor-based fault tolerance (HBFT), a checkpoint-recovery mechanism, is an emerging approach to sustaining mission-critical applications. Based on virtualization technology, H...
Jun Zhu, Wei Dong, Zhefu Jiang, Xiaogang Shi, Zhen...
We present S, the first system to provide transparent, lowoverhead application record-replay and the ability to go live from replayed execution. S i...
—The FRIENDS system developed at LAAS-CNRS is a metalevel architecture providing libraries of metaobjects for fault tolerance, secure communication, and group-based distributed a...