Designing highly dependable systems requires a good understanding of failure characteristics. Unfortunately, little raw data on failures in large IT installations is publicly avai...
This paper shows how the steady-state availability and failure frequency can be calculated in a single pass for very large systems, when the availability is expressed as a product...
The demand for more computational power in science and engineering has spurred the design and deployment of ever-growing cluster systems. Even though the individual components use...
Unreliable failure detectors are a well known means to enrich asynchronous distributed systems with time-free semantics that allow to solve consensus in the presence of crash failu...
A failure detector is a distributed oracle that provides processes in a distributed system with hints about failures. The notion of a weakest failure detector captures the exact a...