Sciweavers

1256 search results - page 3 / 252
» On Coordinated Checkpointing in Distributed Systems
Sort
View
ICDCS
2005
IEEE
14 years 1 months ago
Optimal Asynchronous Garbage Collection for RDT Checkpointing Protocols
Communication-induced checkpointing protocols that ensure rollback-dependency trackability (RDT) guarantee important properties to the recovery system without explicit coordinatio...
Rodrigo Schmidt, Islene C. Garcia, Fernando Pedone...
CLUSTER
2004
IEEE
13 years 11 months ago
Improved message logging versus improved coordinated checkpointing for fault tolerant MPI
Fault tolerance is a very important concern for critical high performance applications using the MPI library. Several protocols provide automatic and transparent fault detection a...
Pierre Lemarinier, Aurelien Bouteiller, Thomas H&e...
PVLDB
2008
110views more  PVLDB 2008»
13 years 7 months ago
Fault-tolerant stream processing using a distributed, replicated file system
We present SGuard, a new fault-tolerance technique for distributed stream processing engines (SPEs) running in clusters of commodity servers. SGuard is less disruptive to normal s...
YongChul Kwon, Magdalena Balazinska, Albert G. Gre...
IPPS
1999
IEEE
13 years 12 months ago
The Performance of Coordinated and Independent Checkpointing
Checkpointing is a very effective technique to tolerate the occurrence of failures in distributed and parallel applications. The existing algorithms in the literature are basicall...
Luís Moura Silva, João Gabriel Silva
IPPS
2006
IEEE
14 years 1 months ago
Coordinated checkpoint from message payload in pessimistic sender-based message logging
Execution of MPI applications on Clusters and Grid deployments suffers from node and network failure that motivates the use of fault tolerant MPI implementations. Two category tec...
M. Aminian, Mohammad K. Akbari, Bahman Javadi