Abstract. We study the problem of global predicate detection in presence of permanent and transient failures. We term the transient failures as small faults. We show that it is imp...
To protect data and recover data in case of failures, Linux operating system has built-in MD device that implements RAID architectures. Such device can recover data in case of sin...
Hard disk drive failures are rare but are often costly. The ability to predict failures is important to consumers, drive manufacturers, and computer system manufacturers alike. In...
Abstract. We study the problem of achieving reliable communication with quiescent algorithms (i.e., algorithms that eventually stop sending messages) in asynchronous systems with p...
Workstation clusters are becoming an interesting alternative to dedicated multiprocessors. In this environment, the probability of a failure, during an application's executio...