Sciweavers

LCPC
2007
Springer

Compiler-Enhanced Incremental Checkpointing

14 years 5 months ago
Compiler-Enhanced Incremental Checkpointing
As modern supercomputing systems reach the peta-flop performance range, they grow in both size and complexity. This makes them increasingly vulnerable to failures from a variety of causes. Checkpointing is a popular technique for tolerating such failures in that it allows applications to periodically save their state and restart the computation after a failure. Although a variety of automated system-level checkpointing solutions are currently available to HPC users, manual application-level checkpointing remains by far the most popular approach because of its superior performance. This paper focuses on improving the performance of automated checkpointing via a compiler analysis for incremental checkpointing. This analysis is shown to significantly reduce checkpoint sizes (upto 78%) and to enable asynchronous checkpointing.
Greg Bronevetsky, Daniel Marques, Keshav Pingali,
Added 08 Jun 2010
Updated 08 Jun 2010
Type Conference
Year 2007
Where LCPC
Authors Greg Bronevetsky, Daniel Marques, Keshav Pingali, Radu Rugina
Comments (0)