Group communication protocols are used in fault-tolerant systems to maintain strong replica consistency. The FaultTolerant Multicast Protocol (FTMP) described here is a group comm...
Louise E. Moser, P. M. Melliar-Smith, Ruppert R. K...
Fault tolerance is an important issue for large machines with tens or hundreds of thousands of processors. Checkpoint-based methods, currently used on most machines, rollback all ...
Multi-party communication complexity involves distributed computation of a function over inputs held by multiple distributed players. A key focus of distributed computing research...
Binbin Chen, Haifeng Yu, Yuda Zhao, Phillip B. Gib...
The paper describes a metaobject architecture for distributed fault tolerant systems. Basically metaobject protocols enables functional objects to be independent from meta-function...
— Fault tolerance in MPI becomes a main issue in the HPC community. Several approaches are envisioned from user or programmer controlled fault tolerance to fully automatic fault ...
Aurelien Bouteiller, Boris Collin, Thomas Hé...