Abstract. As grids typically consist of autonomously managed subsystems with strongly varying resources, fault-tolerance forms an important aspect of the scheduling process of appl...
Maria Chtepen, Filip H. A. Claeys, Bart Dhoedt, Fi...
With the advent of Grid computing, more and more highend computational resources become available for use to a scientist. While this opens up new avenues for scientific research,...
This paper presents a new functionality of the Automatic Differentiation (AD) Tool tapenade. tapenade generates adjoint codes which are widely used for optimization or inverse prob...
Communication-induced checkpointing protocols that ensure rollback-dependency trackability (RDT) guarantee important properties to the recovery system without explicit coordinatio...
Rodrigo Schmidt, Islene C. Garcia, Fernando Pedone...
We present design and implementation details as well as performance results for two new parallel checkpointing libraries developed by us for parallel MPI applications. The first o...