Large-scale parallel computing is relying increasingly on clusters with thousands of processors. At such large counts of compute nodes, faults are becoming common place. Current t...
Arun Babu Nagarajan, Frank Mueller, Christian Enge...
Abstract. Nature is considered one promising area to search for inspiration in designing robotic systems. Some work in swarm robotics has tried to build systems that resemble distr...
Abstract. This article presents a fault tolerant extension for the NaimiTrehel token-based mutual exclusion algorithm. Contrary to the extension proposed by Naimi-Trehel, our appro...
This paper addresses the issue of fault-tolerance in applications that make use of network storage. A network abstraction called the Network Storage Stack is presented, along with...
Scott Atchley, Stephen Soltesz, James S. Plank, Mi...
The design and implementation of distributed real-time dependable systems is often dominated by non-functional considerations like timeliness, object placement and fault tolerance...