Sciweavers

204 search results - page 5 / 41
» Fault-tolerant solutions for a MPI compute intensive applica...
Sort
View
IEEESCC
2008
IEEE
14 years 1 months ago
A Fault Tolerance Approach for Enterprise Applications
Service Oriented Architectures (SOAs) have emerged as a preferred solution to tackle the complexity of large-scale, complex, distributed, and heterogeneous systems. Key to success...
Vina Ermagan, Ingolf Krüger, Massimiliano Men...
ICPP
2009
IEEE
14 years 2 months ago
CIFTS: A Coordinated Infrastructure for Fault-Tolerant Systems
—Considerable work has been done on providing fault tolerance capabilities for different software components on largescale high-end computing systems. Thus far, however, these fa...
Rinku Gupta, Pete Beckman, Byung-Hoon Park, Ewing ...
IPPS
2007
IEEE
14 years 1 months ago
A Fault Tolerance Protocol with Fast Fault Recovery
Fault tolerance is an important issue for large machines with tens or hundreds of thousands of processors. Checkpoint-based methods, currently used on most machines, rollback all ...
Sayantan Chakravorty, Laxmikant V. Kalé
DSN
2002
IEEE
14 years 13 days ago
Generic Timing Fault Tolerance using a Timely Computing Base
Designing applications with timeliness requirements in environments of uncertain synchrony is known to be a difficult problem. In this paper, we follow the perspective of timing ...
Antonio Casimiro, Paulo Veríssimo
ICDCS
2007
IEEE
14 years 1 months ago
Fault Tolerance in Multiprocessor Systems Via Application Cloning
Record and Replay (RR) is a software based state replication solution designed to support recording and subsequent replay of the execution of unmodified applications running on mu...
Philippe Bergheaud, Dinesh Subhraveti, Marc Vertes