

Recent advances in checkpoint/recovery systems

14 years 7 months ago
Recent advances in checkpoint/recovery systems
Checkpoint and Recovery (CPR) systems have many uses in high-performance computing. Because of this, many developers have implemented it, by hand, into their applications. One of the uses of checkpointing is to help mitigate the effects of interruptions in computational service (both planned and unplanned) In fact, some supercomputing centers expect their users to use checkpointing as a matter of policy. And yet, few centers provide fully automatic checkpointing systems for their high-end production machines. The paper is a status report on our work on the family of C3 systems for (almost) fully automatic checkpointing for scientific applications. To date, we have shown that our techniques can be used for checkpointing sequential, MPI and OpenMP applications written in C, Fortran, and several other languages. A novel aspect of our work is that we have not built a single checkpointing system, rather, we have developed a methodology and a set of techniques that have enabled us to devel...
Greg Bronevetsky, Rohit Fernandes, Daniel Marques,
Added 12 Jun 2010
Updated 12 Jun 2010
Type Conference
Year 2006
Where IPPS
Authors Greg Bronevetsky, Rohit Fernandes, Daniel Marques, Keshav Pingali, Paul Stodghill
Comments (0)