Compiler-Enhanced Incremental Checkpointing

14 years 7 months ago

Download greg.bronevetsky.com

As modern supercomputing systems reach the peta-ﬂop performance range, they grow in both size and complexity. This makes them increasingly vulnerable to failures from a variety of causes. Checkpointing is a popular technique for tolerating such failures in that it allows applications to periodically save their state and restart the computation after a failure. Although a variety of automated system-level checkpointing solutions are currently available to HPC users, manual application-level checkpointing remains by far the most popular approach because of its superior performance. This paper focuses on improving the performance of automated checkpointing via a compiler analysis for incremental checkpointing. This analysis is shown to significantly reduce checkpoint sizes (upto 78%) and to enable asynchronous checkpointing.

Greg Bronevetsky, Daniel Marques, Keshav Pingali,

Real-time Traffic

Distributed And Parallel Computing | LCPC 2007 | Manual Application-level Checkpointing | Modern Supercomputing Systems | Peta-ﬂop Performance Range |

claim paper

Post Info
More Details (n/a)

Added	08 Jun 2010
Updated	08 Jun 2010
Type	Conference
Year	2007
Where	LCPC
Authors	Greg Bronevetsky, Daniel Marques, Keshav Pingali, Radu Rugina

Comments (0)

Sciweavers

Compiler-Enhanced Incremental Checkpointing

Distributed And Parallel Computing | LCPC 2007 | Manual Application-level Checkpointing | Modern Supercomputing Systems | Peta-ﬂop Performance Range |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers