Sciweavers

1256 search results - page 5 / 252
» On Coordinated Checkpointing in Distributed Systems
Sort
View
IPPS
2006
IEEE
14 years 1 months ago
Cooperative checkpointing theory
Cooperative checkpointing uses global knowledge of the state and health of the machine to improve performance and reliability by dynamically deciding when to skip checkpoint reque...
Adam J. Oliner, Larry Rudolph, Ramendra K. Sahoo
JPDC
2007
95views more  JPDC 2007»
13 years 7 months ago
Self-stabilizing algorithm for checkpointing in a distributed system
If the variables used for a checkpointing algorithm have data faults, the existing checkpointing and recovery algorithms may fail. In this paper, self-stabilizing data fault detec...
Partha Sarathi Mandal, Krishnendu Mukhopadhyaya
IPPS
2009
IEEE
14 years 2 months ago
Compiler-enhanced incremental checkpointing for OpenMP applications
As modern supercomputing systems reach the peta-flop performance range, they grow in both size and complexity. This makes them increasingly vulnerable to failures from a variety ...
Greg Bronevetsky, Daniel Marques, Keshav Pingali, ...
LCPC
2007
Springer
14 years 1 months ago
Compiler-Enhanced Incremental Checkpointing
As modern supercomputing systems reach the peta-flop performance range, they grow in both size and complexity. This makes them increasingly vulnerable to failures from a variety o...
Greg Bronevetsky, Daniel Marques, Keshav Pingali, ...
POPL
2005
ACM
14 years 7 months ago
Transactors: a programming model for maintaining globally consistent distributed state in unreliable environments
We introduce transactors, a fault-tolerant programming model for composing loosely-coupled distributed components running in an unreliable environment such as the internet into sy...
John Field, Carlos A. Varela