Many mathematical models have been proposed to evaluate the execution performance of an application with and without checkpointing in the presence of failures. They assume that th...
In this paper, we present a new fault tolerance system called DejaVu for transparent and automatic checkpointing, migration, and recovery of parallel and distributed applications....
Joseph F. Ruscio, Michael A. Heffner, Srinidhi Var...
Incremental checkpointing is an cost-efficient fault tolerant technique for long running programs such as genetic algorithms. In this paper, we derive the equations for the writing...
Fault tolerance in parallel systems has traditionally been achieved through a combination of redundancy and checkpointing methods. This notion has also been extended to message-pas...
Rajanikanth Batchu, Yoginder S. Dandass, Anthony S...
This paper proposes a lightweight checkpointing scheme for real-time embedded systems. The goal is to separate concerns by allowing applications to take checkpoints independently ...