Protocols which solve agreement problems are essential building blocks for fault tolerant distributed applications. While many protocols have been published, little has been done ...
Traditional problem determination techniques rely on static dependency models that are difficult to generate accurately in today’s large, distributed, and dynamic application e...
Mike Y. Chen, Emre Kiciman, Eugene Fratkin, Armand...
The NASA Remote Exploration and Experimentation (REE) Project, managed by the Jet Propulsion Laboratory, has the vision of bringing commercial supercomputing technology into space...
This paper reports work to support dependability arguments about the future reliability of a product before there is direct empirical evidence. We develop a method for estimating ...
Designing applications with timeliness requirements in environments of uncertain synchrony is known to be a difficult problem. In this paper, we follow the perspective of timing ...
This paper presents an experimental evaluation of a brake-by-wire application that tolerates transient faults by temporal error masking. A specially designed real-time kernel that...
Joakim Aidemark, Jonny Vinter, Peter Folkesson, Jo...
We present ideas on how to structure software systems for high availability by considering MTTR/MTTF characteristics of components in addition to the traditional criteria, such as...
George Candea, James Cutler, Armando Fox, Rushabh ...
This paper describes a Secure INtrusion-Tolerant Replication Architecture1 (SINTRA) for coordination in asynchronous networks subject to Byzantine faults. SINTRA supplies a number...