The DepAuDE architecture provides middleware to integrate fault tolerance support into distributed embedded automation applications. It allows error recovery to be expressed in te...
Geert Deconinck, Vincenzo De Florio, Ronnie Belman...
Long running High Performance Computing (HPC) applications at scale must be able to tolerate inevitable faults if they are to harness current and future HPC systems. Message Passi...
This workshop provides a forum for an overview, project presentations, and discussion of the research fostered and funded initially by the NSF Next Generation Software (NGS) Progr...
— Checkpointing is an indispensable technique to provide fault tolerance for long-running high-throughput applications like those running on desktop grids. This paper argues that...
Samer Al-Kiswany, Matei Ripeanu, Sudharshan S. Vaz...
In order to show that the parallel co-evolution of di erent heuristic methods may lead to an e cient search strategy, we have hybridized three heuristic agents of complementary beh...