The productivity of HPC system is determined not only by their performance, but also by their reliability. The conventional method to limit the impact of failures is checkpointing...
We present a distributed protocol for maintaining a maximum flow spanning tree in a network, with a designated node as the root of the tree. This maximum flow spanning tree can be...
Disruption Tolerant Network (DTN) is characterized by frequent partitions and intermittent connectivity. Power management issue in such networks is challenging. Existing power man...
This paper presents an experimental evaluation of a brake-by-wire application that tolerates transient faults by temporal error masking. A specially designed real-time kernel that...
Joakim Aidemark, Jonny Vinter, Peter Folkesson, Jo...
A critical challenge to creating effective open multi-agent systems is allowing them to operate effectively in the face of potential failures. In this paper we present an experimen...