Sciweavers

ICPPW
2009
IEEE

Analyzing Checkpointing Trends for Applications on the IBM Blue Gene/P System

13 years 10 months ago
Analyzing Checkpointing Trends for Applications on the IBM Blue Gene/P System
Current petascale systems have tens of thousands of hardware components and complex system software stacks, which increase the probability of faults occurring during the lifetime of a process. Checkpointing has been a popular method of providing fault tolerance in high-end systems. While considerable research has been done to optimize checkpointing, in practice the method still involves a high-cost overhead for users. In this paper, we study the checkpointing overhead seen by applications running on leadership-class machines such as the IBM Blue Gene/P at Argonne National Laboratory. We study various applications and design a methodology to assist users in understanding and choosing checkpointing frequency and reducing the overhead incurred. In particular, we study three popular applications--the Grid-Based Projector-Augmented Wave application, the Carr-Parrinello Molecular Dynamics application, and a Nek5000 computational fluid dynamics application--and analyze their memory usage and...
Harish Gapanati Naik, Rinku Gupta, Pete Beckman
Added 19 Feb 2011
Updated 19 Feb 2011
Type Journal
Year 2009
Where ICPPW
Authors Harish Gapanati Naik, Rinku Gupta, Pete Beckman
Comments (0)