Protocols that solve agreement problems are essential building blocks for fault tolerant distributed systems. While many protocols have been published, little has been done to ana...
In this paper, we describe a scheme for tolerating and recovering from mid-query faults in a distributed shared nothing database. Rather than aborting and restarting queries, our s...
Christopher Yang, Christine Yen, Ceryen Tan, Samue...
To be able to fully exploit ever larger computing platforms, modern HPC applications and system software must be able to tolerate inevitable faults. Historically, MPI implementati...
Joshua Hursey, Jeffrey M. Squyres, Timothy Mattox,...
We present the tool Sycraft (SYmboliC synthesizeR and Adder of Fault-Tolerance). In Sycraft, a distributed fault-intolerant program is specified in terms of a set of processes and ...
In this paper we present an approach to the design optimization of faulttolerant embedded systems for safety-critical applications. Processes are statically scheduled and communic...
Viacheslav Izosimov, Paul Pop, Petru Eles, Zebo Pe...