Checkpointing and rollback recovery is a very effective technique to tolerate transient faults and preventive shutdowns. In the past, most of the checkpointing schemes published i...
A fault-scalable service can be configured to tolerate increasing numbers of faults without significant decreases in performance. The Query/Update (Q/U) protocol is a new tool t...
Michael Abd-El-Malek, Gregory R. Ganger, Garth R. ...
Group communications are commonly used in parallel and distributed environment. However, existing migration mechanisms do not support group communications. This weakness prevents ...
We present in this paper the recent developments done in P2P-MPI, a grid middleware, concerning the fault management, which covers fault-tolerance for applications and fault detect...
This paper presents MojaveFS, a distributed file system with support for sequential consistency. It provides location transparency and makes use of replication for reliability an...
Cristian Tapus, David A. Noblet, Vlad Grama, Jason...