Abstract. Nature is considered one promising area to search for inspiration in designing robotic systems. Some work in swarm robotics has tried to build systems that resemble distr...
As high performance clusters continue to grow in size, the mean time between failure shrinks. Thus, the issues of fault tolerance and reliability are becoming one of the challengi...
This paper describes a new method for providingtransparent fault tolerance for parallel applications on a network of workstations. We have designed our method in the context of sh...
The probability that a failure will occur before the end of the computation increases as the number of processors used in a high performance computing application increases. For l...
As the desire of scientists to perform ever larger computations drives the size of today’s high performance computers from hundreds, to thousands, and even tens of thousands of ...