: This paper presents the results from running five experiments with the Chime Parallel Processing System. The Chime System is an implementation of the CC++ programming language (p...
Anjaneya R. Chagam, Partha Dasgupta, Rajkumar Khan...
We propose a generalized forward recovery checkpointing scheme, with lookahead execution and rollback validation. This method takes advantage of voting and comparison on multiple v...
Grid applications have been prone to encountering problems such as failures or malicious attacks during execution, due to their distributed and large-scale features. The applicati...
Xuanhua Shi, Jean-Louis Pazat, Eric Rodriguez, Hai...
Programmers and users of compute intensive scientific applications often do not want to (or even cannot) code load balancing and fault tolerance into their programs. The PBEAM syst...
Abstract. Dependable distributed applications require flexible infrastructure support for controlled redundancy, replication, and recovery of components and services. However, mos...