Analyzing Checkpointing Trends for Applications on the IBM Blue Gene/P System

13 years 10 months ago

Download www.mcs.anl.gov

Current petascale systems have tens of thousands of hardware components and complex system software stacks, which increase the probability of faults occurring during the lifetime of a process. Checkpointing has been a popular method of providing fault tolerance in high-end systems. While considerable research has been done to optimize checkpointing, in practice the method still involves a high-cost overhead for users. In this paper, we study the checkpointing overhead seen by applications running on leadership-class machines such as the IBM Blue Gene/P at Argonne National Laboratory. We study various applications and design a methodology to assist users in understanding and choosing checkpointing frequency and reducing the overhead incurred. In particular, we study three popular applications--the Grid-Based Projector-Augmented Wave application, the Carr-Parrinello Molecular Dynamics application, and a Nek5000 computational fluid dynamics application--and analyze their memory usage and...

Harish Gapanati Naik, Rinku Gupta, Pete Beckman

Real-time Traffic

Checkpointing | Computational Fluid Dynamics | Distributed And Parallel Computing | Dynamics Application | ICPPW 2009 |

claim paper

Post Info
More Details (n/a)

Added	19 Feb 2011
Updated	19 Feb 2011
Type	Journal
Year	2009
Where	ICPPW
Authors	Harish Gapanati Naik, Rinku Gupta, Pete Beckman

Comments (0)

Sciweavers

Analyzing Checkpointing Trends for Applications on the IBM Blue Gene/P System

Checkpointing | Computational Fluid Dynamics | Distributed And Parallel Computing | Dynamics Application | ICPPW 2009 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers