—The problem of finding efficient workload distribution techniques is becoming increasingly important today for heterogeneous distributed systems where the availability of comp...
Vladimir Shestak, Edwin K. P. Chong, Anthony A. Ma...
This paper presents a method of estimating the availability of fault-tolerant computer systems with several recovery procedures. A segregated failures model has been proposed rece...
Sergiy A. Vilkomir, David Lorge Parnas, Veena B. M...
—Failure detectors are a fundamental part of safe fault-tolerant distributed systems. Many failure detectors use heartbeats to draw conclusions about the state of nodes within a ...
Benjamin Satzger, Andreas Pietzowski, Wolfgang Tru...
Three protocols for gossip-based failure detection services in large-scale heterogeneous clusters are analyzed and compared. The basic gossip protocol provides a means by which fai...
This article deals with stabilization and fault-tolerance. We consider two types of stabilization: the self- and the pseudo- stabilization. Our goal is to implement the self- and/...