Sciweavers

TPDS
1998

On Coordinated Checkpointing in Distributed Systems

14 years 16 days ago
On Coordinated Checkpointing in Distributed Systems
—Coordinated checkpointing simplifies failure recovery and eliminates domino effects in case of failures by preserving a consistent global checkpoint on stable storage. However, the approach suffers from high overhead associated with the checkpointing process. Two approaches are used to reduce the overhead: First is to minimize the number of synchronization messages and the number of checkpoints, the other is to make the checkpointing process nonblocking. These two approaches were orthogonal in previous years until the Prakash-Singhal algorithm [18] combined them. In other words, the Prakash-Singhal algorithm forces only a minimum number of processes to take checkpoints and it does not block the underlying computation. However, we found two problems in this algorithm. In this paper, we identify these problems and prove a more general result: There does not exist a nonblocking algorithm that forces only a minimum number of processes to take their checkpoints. Based on this general res...
Guohong Cao, Mukesh Singhal
Added 23 Dec 2010
Updated 23 Dec 2010
Type Journal
Year 1998
Where TPDS
Authors Guohong Cao, Mukesh Singhal
Comments (0)