Sciweavers

CLUSTER
1999
IEEE

Simulative performance analysis of gossip failure detection for scalable distributed systems

13 years 11 months ago
Simulative performance analysis of gossip failure detection for scalable distributed systems
Three protocols for gossip-based failure detection services in large-scale heterogeneous clusters are analyzed and compared. The basic gossip protocol provides a means by which failures can be detected in large distributed systems in an asynchronous manner without the limits associated with reliable multicasting for group communications. The hierarchical protocol leverages the underlying network topology to achieve faster failure detection. In addition to studying the effectiveness and efficiency of these two agreement protocols, we propose a third protocol that extends the hierarchical approach by piggybacking gossip information on application-generated messages. The protocols are simulated and evaluated with a fault-injection model for scalable distributed systems comprised of clusters of workstations connected by high-performance networks, such as the CPlant machine at Sandia National Laboratories. The model supports permanent and transient node and link failures, with rates specifi...
Mark W. Burns, Alan D. George, Bradley A. Wallace
Added 22 Dec 2010
Updated 22 Dec 2010
Type Journal
Year 1999
Where CLUSTER
Authors Mark W. Burns, Alan D. George, Bradley A. Wallace
Comments (0)