Sciweavers

2400 search results - page 76 / 480
» Systems Failures
Sort
View
LADC
2011
Springer
12 years 12 months ago
Byzantine Fault-Tolerant Deferred Update Replication
Abstract—Replication is a well-established approach to increasing database availability. Many database replication protocols have been proposed for the crash-stop failure model, ...
Fernando Pedone, Nicolas Schiper, José Enri...
ICAC
2005
IEEE
14 years 2 months ago
Distributed Troubleshooting Agents
Key issues to address in autonomic job recovery for cluster computing are recognizing job failure; understanding the failure sufficiently to know if and how to restart the job; an...
Charles Earl, Emilio Remolina, Jim Ong, John Brown
DSOM
2004
Springer
14 years 2 months ago
ABHA: A Framework for Autonomic Job Recovery
Key issues to address in autonomic job recovery for cluster computing are recognizing job failure; understanding the failure sufficiently to know if and how to restart the job; an...
Charles Earl, Emilio Remolina, Jim Ong, John Brown...
PRDC
2007
IEEE
14 years 3 months ago
Implementation of a Flexible Membership Protocol on a Real-Time Ethernet Prototype
This paper describes the implementation of a processorgroup membership protocol in an experimental real-time network. The protocol is appropriate for fault-tolerant distributed sy...
Raul Barbosa, António Ferreira, Johan Karls...
OSDI
2004
ACM
14 years 9 months ago
Microreboot - A Technique for Cheap Recovery
A significant fraction of software failures in large-scale Internet systems are cured by rebooting, even when the exact failure causes are unknown. However, rebooting can be expen...
George Candea, Shinichi Kawamoto, Yuichi Fujiki, G...