Fault tolerance is a very important concern for critical high performance applications using the MPI library. Several protocols provide automatic and transparent fault detection a...
Pierre Lemarinier, Aurelien Bouteiller, Thomas H&e...
In recent years, exciting technological advances have been made in development of flexible electronics. These technologies offer the opportunity to weave computation, communicat...
Roozbeh Jafari, Foad Dabiri, Philip Brisk, Majid S...
It is widely accepted that transient failures will appear more frequently in chips designed in the near future due to several factors such as the increased integration scale. On t...
This paper presents measured probability density functions (pdfs) for the end-to-end latency of two-way remote method invocations from a CORBA client to a replicated CORBA server ...
Wenbing Zhao, Louise E. Moser, P. M. Melliar-Smith
— Large Clusters, high availability clusters and Grid deployments often suffer from network, node or operating system faults and thus require the use of fault tolerant programmin...