Distributed applications can fail in subtle ways that depend on the state of multiple parts of a system. This complicates the validation of such systems via fault injection, since...
Ramesh Chandra, Ryan M. Lefever, Michel Cukier, Wi...
Validating distributed systems is particularly difficult, since failures may occur due to a correlated occurrence of faults in different parts of the system. This paper describes ...
Michel Cukier, Ramesh Chandra, David Henke, Jessic...
The ability to guarantee that a system will continue to operate correctly under degraded conditions is key to the success of adopting multi-agent systems (MAS) as a paradigm for d...
The DepAuDE architecture provides middleware to integrate fault tolerance support into distributed embedded automation applications. It allows error recovery to be expressed in te...
Geert Deconinck, Vincenzo De Florio, Ronnie Belman...
Abstract--We present a fault tolerant task pool execution environment that is capable of performing fine-grain selective restart using a lightweight, distributed task completion tr...
James Dinan, Arjun Singri, P. Sadayappan, Sriram K...