Sciweavers

1256 search results - page 12 / 252
» On Coordinated Checkpointing in Distributed Systems
Sort
View
CONCURRENCY
2007
57views more  CONCURRENCY 2007»
13 years 8 months ago
Performance and effectiveness trade-off for checkpointing in fault-tolerant distributed systems
Panagiotis Katsaros, Lefteris Angelis, Constantine...
ICS
2004
Tsinghua U.
14 years 2 months ago
Adaptive incremental checkpointing for massively parallel systems
Given the scale of massively parallel systems, occurrence of faults is no longer an exception but a regular event. Periodic checkpointing is becoming increasingly important in the...
Saurabh Agarwal, Rahul Garg, Meeta Sharma Gupta, J...
TROB
2002
244views more  TROB 2002»
13 years 8 months ago
Distributed surveillance and reconnaissance using multiple autonomous ATVs: CyberScout
The objective of the CyberScout project is to develop an autonomous surveillance and reconnaissance system using a network of all-terrain vehicles. In this paper, we focus on two f...
Mahesh Saptharishi, C. Spence Oliver, Christopher ...
IPPS
2005
IEEE
14 years 2 months ago
Performance Implications of Periodic Checkpointing on Large-Scale Cluster Systems
Large-scale systems like BlueGene/L are susceptible to a number of software and hardware failures that can affect system performance. Periodic application checkpointing is a commo...
Adam J. Oliner, Ramendra K. Sahoo, José E. ...
ISCA
2002
IEEE
115views Hardware» more  ISCA 2002»
14 years 1 months ago
SafetyNet: Improving the Availability of Shared Memory Multiprocessors with Global Checkpoint/Recovery
We develop an availability solution, called SafetyNet, that uses a unified, lightweight checkpoint/recovery mechanism to support multiple long-latency fault detection schemes. At...
Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill, ...