Manycast is a group communication primitive wherein the source is required to send data packets to a certain number of a given set of destinations. In this article, we design faul...
New single-machine environments are emerging from abundant computation available through multiple cores and secure virtualization. In this paper, we describe the research challeng...
Fault tolerance is a very important concern for critical high performance applications using the MPI library. Several protocols provide automatic and transparent fault detection a...
Pierre Lemarinier, Aurelien Bouteiller, Thomas H&e...
Constructing logical machines out of collections of physical machines is a well-known technique for improving the robustness and fault tolerance of distributed systems. We present...
Yair Amir, Brian A. Coan, Jonathan Kirsch, John La...
Failures of all forms happen: from losing single network packets to site-wide disasters. Since businesses rely heavily on their data, it is imperative that failures require minima...