Sciweavers

276 search results - page 25 / 56
» A Fault Tolerant Abstraction for Transparent Distributed Pro...
Sort
View
HPCA
2009
IEEE
14 years 9 months ago
Eliminating microarchitectural dependency from Architectural Vulnerability
The Architectural Vulnerability Factor (AVF) of a hardware structure is the probability that a fault in the structure will affect the output of a program. AVF captures both microa...
Vilas Sridharan, David R. Kaeli
CCGRID
2006
IEEE
14 years 2 months ago
Proposal of MPI Operation Level Checkpoint/Rollback and One Implementation
With the increasing number of processors in modern HPC(High Performance Computing) systems, there are two emergent problems to solve. One is scalability, the other is fault tolera...
Yuan Tang, Graham E. Fagg, Jack Dongarra
HPDC
2009
IEEE
14 years 3 months ago
Interconnect agnostic checkpoint/restart in open MPI
Long running High Performance Computing (HPC) applications at scale must be able to tolerate inevitable faults if they are to harness current and future HPC systems. Message Passi...
Joshua Hursey, Timothy Mattox, Andrew Lumsdaine
CORR
2010
Springer
136views Education» more  CORR 2010»
13 years 8 months ago
Applying Prolog to Develop Distributed Systems
Development of distributed systems is a difficult task. Declarative programming techniques hold a promising potential for effectively supporting programmer in this challenge. Whil...
Nuno P. Lopes, Juan A. Navarro, Andrey Rybalchenko...
IEEEPACT
2007
IEEE
14 years 2 months ago
Error Detection Using Dynamic Dataflow Verification
Continued scaling of CMOS technology to smaller transistor sizes makes modern processors more susceptible to both transient and permanent hardware faults. Circuitlevel techniques ...
Albert Meixner, Daniel J. Sorin