This paper presents a component model for building distributed applications with fault-tolerance requirements. The AFT-CCM model selects the configuration of replicated services d...
This paper reports on the architecture and design of Starfish, an environment for executing dynamic (and static) MPI-2 programs on a cluster of workstations. Starfish is unique in ...
Commodity computer clusters are often composed of hundreds of computing nodes. These generally off-the-shelf systems are not designed for high reliability. Node failures therefore...
In order to be able to tolerate the effects of faults, we must first detect the symptoms of faults, i.e. the errors. This paper evaluates the error detection properties of an erro...
In this paper, we describe a scheme for tolerating and recovering from mid-query faults in a distributed shared nothing database. Rather than aborting and restarting queries, our s...
Christopher Yang, Christine Yen, Ceryen Tan, Samue...