As the core count in high-performance computing systems keeps increasing, faults are becoming common place. Checkpointing addresses such faults but captures full process images ev...
Chao Wang, Frank Mueller, Christian Engelmann, Ste...
Checkpointing is a commonly used approach to provide system fault-tolerance. However, using a constant checkpointing frequency may compromise the system's overall performance ...
Advances in hardware have enabled many long-running applications to execute entirely in main memory. As a result, these applications have increasingly turned to database technique...
Tuan Cao, Marcos Antonio Vaz Salles, Benjamin Sowe...
Cooperative checkpointing uses global knowledge of the state and health of the machine to improve performance and reliability by dynamically deciding when to skip checkpoint reque...
Cooperative checkpointing, in which the system dynamically skips checkpoints requested by applications at runtime, can exploit system-level information to improve performance and ...