Sciweavers

354 search results - page 9 / 71
» Self Adaptive Application Level Fault Tolerance for Parallel...
Sort
View
CLUSTER
2002
IEEE
14 years 1 months ago
Design and Validation of Portable Communication Infrastructure for Fault-Tolerant Cluster Middleware
We describe the communication infrastructure (CI) for our fault-tolerant cluster middleware, which is optimized for two classes of communication: for the applications and for the ...
Ming Li, Wenchao Tao, Daniel Goldberg, Israel Hsu,...
IPPS
2006
IEEE
14 years 2 months ago
Algorithm-based checkpoint-free fault tolerance for parallel matrix computations on volatile resources
As the desire of scientists to perform ever larger computations drives the size of today’s high performance computers from hundreds, to thousands, and even tens of thousands of ...
Zizhong Chen, Jack Dongarra
ISPAN
2000
IEEE
14 years 29 days ago
Fault-Tolerant Wormhole Routing in 2D Meshes
We present an adaptive fault-tolerant wormhole routing algorithm for 2D meshes. The main feature is that with the algorithm, a normal routing message, when blocked by some faulty ...
Jipeng Zhou, Francis C. M. Lau
HPDC
2008
IEEE
14 years 3 months ago
Dynasa: adapting grid applications to safety using fault-tolerant methods
Grid applications have been prone to encountering problems such as failures or malicious attacks during execution, due to their distributed and large-scale features. The applicati...
Xuanhua Shi, Jean-Louis Pazat, Eric Rodriguez, Hai...
TDSC
2011
13 years 3 months ago
Application-Level Diagnostic and Membership Protocols for Generic Time-Triggered Systems
Abstract— We present on-line tunable diagnostic and membership protocols for generic time-triggered (TT) systems to detect crashes, send/receive omission faults and network parti...
Marco Serafini, Péter Bokor, Neeraj Suri, J...