Protocols that solve agreement problems are essential building blocks for fault tolerant distributed systems. While many protocols have been published, little has been done to ana...
Today, large scale parallel systems are available at relatively low cost. Many powerful such systems have been installed all over the world and the number of users is always incre...
Fault tolerance is one of the key issues for large scale applications executed on high performance computing systems. In a cluster federation, clusters are gathered to provide hug...
We describe a very large scale distributed robotic system, involving a team of over 100 robots, that has been successfully deployed in large, unknown indoor environments, over ext...
This paper presents a new approach for analyzing the performance of grid scheduling algorithms for tasks with dependencies. Finding the optimal procedures for DAG scheduling in Gr...