Coherence-based Coordinated Checkpointing for Software Distributed Shared Memory Systems

14 years 4 months ago

Download www.cacs.louisiana.edu

Fault-tolerant techniques that can cope with system failures in software distributed shared memory (SDSM) are essential for creating productive and highly available parallel computing environments on clusters of workstations. In this paper, we propose a new, efﬁcient coordinated checkpointing technique, called coherence-based coordinated checkpointing (CCC), for SDSM. Our CCC minimizes both the checkpointing overhead during failure-free execution and the cost of recovery from failures by leveraging existing coherence information maintained by SDSM. In the presence of system failures, it allows SDSM to recover from the most recent checkpoint, saving the re-computation time. We have performed experiments on a cluster of eight Sun Ultra-5 workstations, comparing our CCC technique against both simple coordinated checkpointing (SCC) and incremental coordinated checkpointing (ICC) techniques by actually implementing these techniques in TreadMarks, a state-of-the-art SDSM system. The exper...

Angkul Kongmunvattana, Santipong Tanchatchawal, Ni

Real-time Traffic

CCC Technique | Checkpointing | Distributed And Parallel Computing | ICC Techniques | ICDCS 2000 |

claim paper

Post Info
More Details (n/a)

Added	31 Jul 2010
Updated	31 Jul 2010
Type	Conference
Year	2000
Where	ICDCS
Authors	Angkul Kongmunvattana, Santipong Tanchatchawal, Nian-Feng Tzeng

Comments (0)

Sciweavers

Coherence-based Coordinated Checkpointing for Software Distributed Shared Memory Systems

CCC Technique | Checkpointing | Distributed And Parallel Computing | ICC Techniques | ICDCS 2000 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers