To be able to fully exploit ever larger computing platforms, modern HPC applications and system software must be able to tolerate inevitable faults. Historically, MPI implementati...
Joshua Hursey, Jeffrey M. Squyres, Timothy Mattox,...
Quorum protocols offer several benefits when used to maintain replicated data but techniques for reducing overheads associated with them have not been explored in detail. It is d...
Lei Kong, Deepak J. Manohar, Mustaque Ahamad, Arun...
We present an algorithm based on temporal-epistemic model checking combined with fault injection to analyse automatically the diagnosability of faults by agents in the system. We d...
The fault tolerance provided by FT-CORBA is basically static, that is, once the fault tolerance properties of a group of replicated processes defined, they cannot be modified in r...
The proposed software technique is a very low cost and an effective solution towards designing Byzantine fault tolerant computing application systems that are not so safety critic...