In this paper, we introduce an efficient technique for checkpointing multithreaded applications. Our approach makes use of processes constructed around the ARMOR (Adaptive Reconfi...
Keith Whisnant, Zbigniew Kalbarczyk, Ravishankar K...
Today’s largest High Performance Computing (HPC) systems exceed one Petaflops (1015 floating point operations per second) and exascale systems are projected within seven years...
James Elliott, Kishor Kharbas, David Fiala, Frank ...
Checkpointing is a very effective technique to tolerate the occurrence of failures in distributed and parallel applications. The existing algorithms in the literature are basicall...
Several schemes for checkpointing and rollback recovery have been reported in the literature. In this paper, we analyze some of these schemes under a stochastic model. We have der...
Multiple threads running in a single, shared address space is a simple model for writing parallel programs for symmetric multiprocessor (SMP) machines and for overlapping I/O and ...