Failure detectors (or, more accurately Failure Suspectors { FS) appear to be a fundamental service upon which to build fault-tolerant, distributed applications. This paper shows t...
Interval availability is a dependability measure defined by the fraction of time during which a system is in operation over a finite observation period. The computation of its d...
Although the present work does in fact employ training data, it does so in the interest of calibrating the results Six hundred faults were induced by injection into five live obtai...
This paper describes and evaluates two algorithms for performing on-line failure recovery (data reconstruction) in redundant disk arrays. It presents an implementation of disk-ori...
Mark Holland, Garth A. Gibson, Daniel P. Siewiorek
Fault tolerance requirements for near term disk array storage systems are analyzed. The excellent reliability provided by RAID Level 5 data organization is seen to be insu cient f...
Currently existing message logging protocols demonstrate a classic pessimistic vs. optimistic tradeoff. We show that the optimistic–pessimistic tradeoff is not inherent to the p...
We present a new test response compression method called cumulative balance testing (CBT)that extends both balance testing and accumulatorcompression testing. CBT uses an accumulat...