Commodity computer clusters are often composed of hundreds of computing nodes. These generally off-the-shelf systems are not designed for high reliability. Node failures therefore...
Dynamic Parallel Schedules (DPS) is a flow graph based framework for developing parallel applications on clusters of workstations. The DPS flow graph execution model enables automa...
Sebastian Gerlach, Basile Schaeli, Roger D. Hersch
Most application level fault tolerance schemes in literature are non-adaptive in the sense that the fault tolerance schemes incorporated in applications are usually designed witho...
Zizhong Chen, Ming Yang, Guillermo A. Francia III,...
This paper describes a new method for providingtransparent fault tolerance for parallel applications on a network of workstations. We have designed our method in the context of sh...