Abstract. A grid checkpointing service providing migration and transparent fault tolerance is important for distributed and parallel applications executed in heterogeneous grids. I...
With rapid increase of parallel computation systems in their sizes, it is inevitable to develop algorithms that are applicable even if there exist faulty elements in the systems. ...
In recent years, there have been a few proposals to add a small amount of trusted hardware at each replica in a Byzantine fault tolerant system to cut back replication factors. Th...
Allen Clement, Flavio Junqueira, Aniket Kate, Rodr...
Providing reliable fault-tolerant sensors is a challenge for distributed systems. The demonstration setup combines three sensors and allows to inject different faults that are rel...
We describe and test a software approach to overcoming radiation-induced errors in spaceborne applications running on commercial off-the-shelf components. The approach uses checks...