Sciweavers

535 search results - page 8 / 107
» Fault tolerant high performance computing by a coding approa...
Sort
View
ICPADS
1994
IEEE
13 years 11 months ago
Efficient Fault Tolerance: An Approach to Deal with Transient Faults in Multiprocessor Architectures
Dynamic error processing approaches are an important mechanism to increase the reliability in a multiprocessor system, while making efficient use of the available resources. To th...
Andrea Bondavalli, Silvano Chiaradonna, Felicita D...
HPDC
2009
IEEE
14 years 2 months ago
Interconnect agnostic checkpoint/restart in open MPI
Long running High Performance Computing (HPC) applications at scale must be able to tolerate inevitable faults if they are to harness current and future HPC systems. Message Passi...
Joshua Hursey, Timothy Mattox, Andrew Lumsdaine
DFT
2003
IEEE
142views VLSI» more  DFT 2003»
14 years 26 days ago
Exploiting Instruction Redundancy for Transient Fault Tolerance
This paper presents an approach for integrating fault-tolerance techniques into microprocessors by utilizing instruction redundancy as well as time redundancy. Smaller and smaller...
Toshinori Sato
ASPDAC
2005
ACM
109views Hardware» more  ASPDAC 2005»
13 years 9 months ago
Fault tolerant nanoelectronic processor architectures
In this paper we propose a fault-tolerant processor architecture and an associated fault-tolerant computation model capable of fault tolerance in the nanoelectronic environment th...
Wenjing Rao, Alex Orailoglu, Ramesh Karri
IPPS
2010
IEEE
13 years 5 months ago
Improving the performance of hypervisor-based fault tolerance
Hypervisor-based fault tolerance (HBFT), a checkpoint-recovery mechanism, is an emerging approach to sustaining mission-critical applications. Based on virtualization technology, H...
Jun Zhu, Wei Dong, Zhefu Jiang, Xiaogang Shi, Zhen...