Sciweavers

695 search results - page 23 / 139
» Cache based fault recovery for distributed systems
Sort
View
EUROPAR
2005
Springer
14 years 27 days ago
Self-stabilizing Publish/Subscribe Systems: Algorithms and Evaluation
Most research in the area of publish/subscribe systems has not considered fault-tolerance as a central design issues. However, faults do obviously occur and masking all faults is a...
Gero Mühl, Michael A. Jaeger, Klaus Herrmann,...
SRDS
1994
IEEE
13 years 11 months ago
Coordinated Checkpointing-Rollback Error Recovery for Distributed Shared Memory Multicomputers
Most recovery schemes that have been proposed for Distributed Shared Memory (DSM) systems require unnecessarily high checkpointing frequency and checkpoint traffic, which are sens...
G. Janakiraman, Yuval Tamir
PODC
1998
ACM
13 years 11 months ago
Persistent Messages in Local Transactions
: We present a new model for handling messages and state in a distributed application that we call Messages in Local Transactions (MLT). Under this model, messages and data are not...
David E. Lowell, Peter M. Chen
PADS
1998
ACM
13 years 11 months ago
Fault-Tolerant Distributed Simulation
In traditional distributed simulation schemes, entire simulation needs to be restarted if any of the participating LP crashes. This is highly undesirable for long running simulati...
Om P. Damani, Vijay K. Garg
CLUSTER
2004
IEEE
13 years 11 months ago
Improved message logging versus improved coordinated checkpointing for fault tolerant MPI
Fault tolerance is a very important concern for critical high performance applications using the MPI library. Several protocols provide automatic and transparent fault detection a...
Pierre Lemarinier, Aurelien Bouteiller, Thomas H&e...