Sciweavers

1532 search results - page 16 / 307
» A Comparison of RESTART Implementations
Sort
View
IPPS
2007
IEEE
14 years 1 months ago
A Fault Tolerance Protocol with Fast Fault Recovery
Fault tolerance is an important issue for large machines with tens or hundreds of thousands of processors. Checkpoint-based methods, currently used on most machines, rollback all ...
Sayantan Chakravorty, Laxmikant V. Kalé
DSN
2000
IEEE
13 years 11 months ago
Exploiting Non-Determinism for Reliability of Mobile Agent Systems
An important technical hurdle blocking the adoption of mobile agent technology is the lack of reliability. Designing a reliable mobile agent system is especially challenging since...
Ajay Mohindra, Apratim Purakayastha, Prasannaa Tha...
PADS
1998
ACM
13 years 11 months ago
Fault-Tolerant Distributed Simulation
In traditional distributed simulation schemes, entire simulation needs to be restarted if any of the participating LP crashes. This is highly undesirable for long running simulati...
Om P. Damani, Vijay K. Garg
ICPADS
2010
IEEE
13 years 5 months ago
Hybrid Checkpointing for MPI Jobs in HPC Environments
As the core count in high-performance computing systems keeps increasing, faults are becoming common place. Checkpointing addresses such faults but captures full process images ev...
Chao Wang, Frank Mueller, Christian Engelmann, Ste...
USENIX
1990
13 years 8 months ago
swm: An X Window Manager Shell
swm is a policy-free, user configurable window manager client for the X Window System. Besides providing basic window manager functionality, swm introduces new features not found ...
Thomas E. LaStrange