Sciweavers

197 search results - page 19 / 40
» A Framework for Proactive Fault Tolerance
Sort
View
CCGRID
2006
IEEE
14 years 1 months ago
Proposal of MPI Operation Level Checkpoint/Rollback and One Implementation
With the increasing number of processors in modern HPC(High Performance Computing) systems, there are two emergent problems to solve. One is scalability, the other is fault tolera...
Yuan Tang, Graham E. Fagg, Jack Dongarra
BCS
2008
13 years 9 months ago
Hardware Dependability in the Presence of Soft Errors
Using formal verification for designing hardware designs free from logic design bugs has been an active area of research since the last 15 years. Technology has matured and we hav...
Ashish Darbari, Bashir M. Al-Hashimi
HASE
1998
IEEE
13 years 12 months ago
Combining Various Solution Techniques for Dynamic Fault Tree Analysis of Computer Systems
Fault trees provide a graphical and logical framework for analyzing the reliability of systems. A fault tree provides a conceptually simple modeling framework to represent the sys...
Ragavan Manian, Joanne Bechta Dugan, David Coppit,...
EDCC
2008
Springer
13 years 9 months ago
A Distributed Approach to Autonomous Fault Treatment in Spread
This paper presents the design and implementation of the Distributed Autonomous Replication Management (DARM) framework built on top of the Spread group communication system. The ...
Hein Meling, Joakim L. Gilje
CCS
2011
ACM
12 years 7 months ago
How to tell if your cloud files are vulnerable to drive crashes
This paper presents a new challenge—verifying that a remote server is storing a file in a fault-tolerant manner, i.e., such that it can survive hard-drive failures. We describe...
Kevin D. Bowers, Marten van Dijk, Ari Juels, Alina...