We describe the communication infrastructure (CI) for our fault-tolerant cluster middleware, which is optimized for two classes of communication: for the applications and for the ...
Ming Li, Wenchao Tao, Daniel Goldberg, Israel Hsu,...
Designing a distributed fault tolerance algorithm requires careful analysis of both fault models and diagnosis strategies. A system will fail if there are too many active faults, ...
Abstract-- High performance computing platforms like Clusters, Grid and Desktop Grids are becoming larger and subject to more frequent failures. MPI is one of the most used message...
In this paper we propose a fault-tolerant processor architecture and an associated fault-tolerant computation model capable of fault tolerance in the nanoelectronic environment th...