Team-Based Message Logging: Preliminary Results

15 years 8 months ago

Download charm.cs.uiuc.edu

Fault tolerance will be a fundamental imperative in the next decade as machines containing hundreds of thousands of cores will be installed at various locations. In this context, the traditional checkpoint/restart model does not seem to be a suitable option, since it makes all the processors roll back to their latest checkpoint in case of a single failure in one of the processors. In-memory message logging is an alternative that avoids this global restoration process and instead replays the messages to the failed processor. However, there is a large memory overhead associated with message logging because each message must be logged so it can be played back if a failure occurs. In this paper, we introduce a technique to alleviate the demand of memory in message logging by grouping processors into teams. These teams act as a failure unit: if one team member fails, all the other members in that team roll back to their latest checkpoint and start the recovery process. This eliminates the ...

Esteban Meneses, Celso L. Mendes, Laxmikant V. Kal

Real-time Traffic

CCGRID 2010 | Distributed And Parallel Computing | In-memory Message Logging | Latest Checkpoint | Message Logging |

claim paper

Post Info
More Details (n/a)

Added	08 Nov 2010
Updated	08 Nov 2010
Type	Conference
Year	2010
Where	CCGRID
Authors	Esteban Meneses, Celso L. Mendes, Laxmikant V. Kalé

Comments (0)

Sciweavers

Team-Based Message Logging: Preliminary Results

CCGRID 2010 | Distributed And Parallel Computing | In-memory Message Logging | Latest Checkpoint | Message Logging |

Explore & Download

Productivity Tools

Sciweavers