Sciweavers

234 search results - page 7 / 47
» Optimal recovery schemes in fault tolerant distributed compu...
Sort
View
EDCC
2008
Springer
13 years 10 months ago
A Distributed Approach to Autonomous Fault Treatment in Spread
This paper presents the design and implementation of the Distributed Autonomous Replication Management (DARM) framework built on top of the Spread group communication system. The ...
Hein Meling, Joakim L. Gilje
ISADS
2003
IEEE
14 years 1 months ago
Message Logging and Recovery in Wireless CORBA Using Access Bridge
The emerging mobile wireless environment poses exciting challenges for distributed fault tolerant (FT) computing. This paper proposes a message loggingand recovery protocol on the...
Xinyu Chen, Michael R. Lyu
HPDC
2011
IEEE
13 years 5 days ago
Algorithm-based recovery for iterative methods without checkpointing
In today’s high performance computing practice, fail-stop failures are often tolerated by checkpointing. While checkpointing is a very general technique and can often be applied...
Zizhong Chen
CCGRID
2010
IEEE
13 years 9 months ago
Selective Recovery from Failures in a Task Parallel Programming Model
Abstract--We present a fault tolerant task pool execution environment that is capable of performing fine-grain selective restart using a lightweight, distributed task completion tr...
James Dinan, Arjun Singri, P. Sadayappan, Sriram K...
EDCC
2006
Springer
14 years 3 days ago
SEU Mitigation Techniques for Microprocessor Control Logic
The importance of fault tolerance at the processor architecture level has been made increasingly important due to rapid advancements in the design and usage of high performance de...
T. S. Ganesh, Viswanathan Subramanian, Arun K. Som...