Managing the execution of scientific applications in a heterogeneous grid computing environment can be a daunting task, particularly for long running jobs. Increasing fault tolera...
As the size and popularity of computer clusters go on growing, fault tolerance is becoming a crucial factor to ensure high performance and reliability for applications. To provide...
Antonio S. Martins, Ronaldo Augusto Lara Gon&ccedi...
Fault tolerant distributed protocols typically utilize a homogeneous fault model, either fail-crash or fail-Byzantine, where all processors are assumed to fail in the same manner....
We propose a generalized forward recovery checkpointing scheme, with lookahead execution and rollback validation. This method takes advantage of voting and comparison on multiple v...
In this paper we present an approach to the scheduling and voltage scaling of low-power fault-tolerant hard real-time applications mapped on distributed heterogeneous embedded sys...