Sciweavers

SIGMETRICS
2002
ACM

Improving cluster availability using workstation validation

14 years 3 days ago
Improving cluster availability using workstation validation
We demonstrate a framework for improving the availability of cluster based Internet services. Our approach models Internet services as a collection of interconnected components, each possessing well defined interfaces and failure semantics. Such a decomposition allows designers to engineer high availability based on an understanding of the interconnections and isolated fault behavior of each component, as opposed to ad-hoc methods. In this work, we focus on using the entire commodity workstation as a component because it possesses natural, fault-isolated interfaces. We define a failure event as a reboot because not only is a workstation unavailable during a reboot, but also because reboots are symptomatic of a larger class of failures, such as configuration and operator errors. Our observations of 3 distinct clusters show that the time between reboots is best modeled by a Weibull distribution with shape parameters of less than 1, implying that a workstation becomes more reliable the l...
Taliver Heath, Richard P. Martin, Thu D. Nguyen
Added 23 Dec 2010
Updated 23 Dec 2010
Type Journal
Year 2002
Where SIGMETRICS
Authors Taliver Heath, Richard P. Martin, Thu D. Nguyen
Comments (0)