Sciweavers

35 search results - page 3 / 7
» Transparent checkpoints of closed distributed systems in Emu...
Sort
View
ICDCS
2012
IEEE
11 years 9 months ago
Combining Partial Redundancy and Checkpointing for HPC
Today’s largest High Performance Computing (HPC) systems exceed one Petaflops (1015 floating point operations per second) and exascale systems are projected within seven years...
James Elliott, Kishor Kharbas, David Fiala, Frank ...
CLUSTER
2004
IEEE
13 years 10 months ago
Improved message logging versus improved coordinated checkpointing for fault tolerant MPI
Fault tolerance is a very important concern for critical high performance applications using the MPI library. Several protocols provide automatic and transparent fault detection a...
Pierre Lemarinier, Aurelien Bouteiller, Thomas H&e...
SAC
2005
ACM
14 years 17 days ago
Adaptation point analysis for computation migration/checkpointing
Finding the appropriate location of adaptation points for computation migration/checkpointing is critical since the distance between two consecutive adaptation points determines t...
Yanqing Ji, Hai Jiang, Vipin Chaudhary
WORDS
2003
IEEE
14 years 8 days ago
Decentralized Resource Management and Fault-Tolerance for Distributed CORBA Applications
Assigning an application’s fault-tolerance properties (e.g., replication style, checkpointing frequency) statically, and in an arbitrary manner, can lead to the application not ...
Carlos F. Reverte, Priya Narasimhan
ECOOPW
1999
Springer
13 years 11 months ago
Providing Policy-Neutral and Transparent Access Control in Extensible Systems
Extensible systems, such as Java or the SPIN extensible operating system, allow for units of code, or extensions, to be added to a running system in almost arbitrary fashion. Exte...
Robert Grimm, Brian N. Bershad