Sciweavers

392 search results - page 24 / 79
» Fault Tolerance in a DSM Cluster Operating System
Sort
View
IPPS
1998
IEEE
14 years 29 days ago
Migration and Rollback Transparency for Arbitrary Distributed Applications in Workstation Clusters
Programmers and users of compute intensive scientific applications often do not want to (or even cannot) code load balancing and fault tolerance into their programs. The PBEAM syst...
Stefan Petri, Matthias Bolz, Horst Langendörf...
EGC
2005
Springer
14 years 2 months ago
Fault Tolerance in the R-GMA Information and Monitoring System
R-GMA (Relational Grid Monitoring Architecture) [1] is a grid monitoring and information system that provides a global view of data distributed across a grid system. R-GMA creates ...
Rob Byrom, Brian A. Coghlan, Andrew W. Cooke, Rone...
SRDS
1999
IEEE
14 years 1 months ago
Logging and Recovery in Adaptive Software Distributed Shared Memory Systems
Software distributed shared memory (DSM) improves the programmability of message-passing machines and workclusters by providing a shared memory abstract (i.e., a coherent global a...
Angkul Kongmunvattana, Nian-Feng Tzeng
CLUSTER
2004
IEEE
14 years 14 days ago
Improved message logging versus improved coordinated checkpointing for fault tolerant MPI
Fault tolerance is a very important concern for critical high performance applications using the MPI library. Several protocols provide automatic and transparent fault detection a...
Pierre Lemarinier, Aurelien Bouteiller, Thomas H&e...
IJHPCA
2006
114views more  IJHPCA 2006»
13 years 8 months ago
Fault-Tolerant Scheduling of Fine-Grained Tasks in Grid Environments
Divide-and-conquer is a well-suited programming paradigm for parallel Grid applications. Our Satin system efficiently schedules the finegrained tasks of a divide-and-conquer appli...
Gosia Wrzesinska, Rob van Nieuwpoort, Jason Maasse...