Sciweavers

48 search results - page 6 / 10
» Self-stabilizing algorithm for checkpointing in a distribute...
Sort
View
ICS
2004
Tsinghua U.
14 years 1 months ago
Adaptive incremental checkpointing for massively parallel systems
Given the scale of massively parallel systems, occurrence of faults is no longer an exception but a regular event. Periodic checkpointing is becoming increasingly important in the...
Saurabh Agarwal, Rahul Garg, Meeta Sharma Gupta, J...
ICS
2011
Tsinghua U.
12 years 11 months ago
High performance linpack benchmark: a fault tolerant implementation without checkpointing
The probability that a failure will occur before the end of the computation increases as the number of processors used in a high performance computing application increases. For l...
Teresa Davies, Christer Karlsson, Hui Liu, Chong D...
SC
2009
ACM
14 years 2 months ago
FALCON: a system for reliable checkpoint recovery in shared grid environments
In Fine-Grained Cycle Sharing (FGCS) systems, machine owners voluntarily share their unused CPU cycles with guest jobs, as long as the performance degradation is tolerable. For gu...
Tanzima Zerin Islam, Saurabh Bagchi, Rudolf Eigenm...
VEE
2006
ACM
126views Virtualization» more  VEE 2006»
14 years 1 months ago
A new approach to real-time checkpointing
The progress towards programming methodologies that simplify the work of the programmer involves automating, whenever possible, activities that are secondary to the main task of d...
Antonio Cunei, Jan Vitek
SIGSOFT
2007
ACM
14 years 8 months ago
Efficient checkpointing of java software using context-sensitive capture and replay
Checkpointing and replaying is an attractive technique that has been used widely at the operating/runtime system level to provide fault tolerance. Applying such a technique at the...
Guoqing Xu, Atanas Rountev, Yan Tang, Feng Qin