MPI/FT: A Model-Based Approach to Low-Overhead Fault Tolerant Message-Passing Middleware

15 years 6 months ago

Download www.cse.msstate.edu

Fault tolerance in parallel systems has traditionally been achieved through a combination of redundancy and checkpointing methods. This notion has also been extended to message-passing systems with user-transparent process checkpointing and message logging. Furthermore, studies of multiple types of rollback and recovery have been reported in literature, ranging from communication-induced checkpointing to pessimistic and synchronous solutions. However, many of these solutions incorporate high overhead because of their inability to utilize application level information.

Rajanikanth Batchu, Yoginder S. Dandass, Anthony S

Real-time Traffic

Checkpointing | CLUSTER 2004 | Distributed And Parallel Computing | Fault Tolerance | Parallel Systems |

claim paper

Post Info
More Details (n/a)

Added	16 Dec 2010
Updated	16 Dec 2010
Type	Journal
Year	2004
Where	CLUSTER
Authors	Rajanikanth Batchu, Yoginder S. Dandass, Anthony Skjellum, Murali Beddhu

Comments (0)

Sciweavers

MPI/FT: A Model-Based Approach to Low-Overhead Fault Tolerant Message-Passing Middleware

Checkpointing | CLUSTER 2004 | Distributed And Parallel Computing | Fault Tolerance | Parallel Systems |

Explore & Download

Productivity Tools

Sciweavers