Group communications are commonly used in parallel and distributed environment. However, existing migration mechanisms do not support group communications. This weakness prevents ...
Abstract. With the number of computing elements spiraling to hundred of thousands in modern HPC systems, failures are common events. Few applications are nevertheless fault toleran...
George Bosilca, Aurelien Bouteiller, Thomas H&eacu...
Group communication protocols are used in fault-tolerant systems to maintain strong replica consistency. The FaultTolerant Multicast Protocol (FTMP) described here is a group comm...
Louise E. Moser, P. M. Melliar-Smith, Ruppert R. K...
MapReduce is a popular framework for data-intensive distributed computing of batch jobs. To simplify fault tolerance, the output of each MapReduce task and job is materialized to ...
Tyson Condie, Neil Conway, Peter Alvaro, Joseph M....
Secure, fault-tolerant distributed systems are difficult to build, to validate, and to operate. Conservative design for such systems dictates that their security and fault toleran...