Designing a distributed fault tolerance algorithm requires careful analysis of both fault models and diagnosis strategies. A system will fail if there are too many active faults, ...
This paper presents an experimental evaluation of a brake-by-wire application that tolerates transient faults by temporal error masking. A specially designed real-time kernel that...
Joakim Aidemark, Jonny Vinter, Peter Folkesson, Jo...
In this paper we show how to reduce downtime of J2EE applications by rapidly and automatically recovering from transient and intermittent software failures, without requiring appl...
George Candea, Emre Kiciman, Shinichi Kawamoto, Ar...
This paper presents DARX, our framework for building applications that provide adaptive fault tolerance. It relies on the fact that multi-agent platforms constitute a very strong ...
The advent of deep sub-micron technology has exacerbated reliability issues in on-chip interconnects. In particular, single event upsets, such as soft errors, and hard faults are ...
Dongkook Park, Chrysostomos Nicopoulos, Jongman Ki...