Service Oriented Architectures (SOAs) have emerged as a preferred solution to tackle the complexity of large-scale, complex, distributed, and heterogeneous systems. Key to success...
—Considerable work has been done on providing fault tolerance capabilities for different software components on largescale high-end computing systems. Thus far, however, these fa...
Rinku Gupta, Pete Beckman, Byung-Hoon Park, Ewing ...
Fault tolerance is an important issue for large machines with tens or hundreds of thousands of processors. Checkpoint-based methods, currently used on most machines, rollback all ...
Designing applications with timeliness requirements in environments of uncertain synchrony is known to be a difficult problem. In this paper, we follow the perspective of timing ...
Record and Replay (RR) is a software based state replication solution designed to support recording and subsequent replay of the execution of unmodified applications running on mu...
Philippe Bergheaud, Dinesh Subhraveti, Marc Vertes