Dynamic error processing approaches are an important mechanism to increase the reliability in a multiprocessor system, while making efficient use of the available resources. To th...
Andrea Bondavalli, Silvano Chiaradonna, Felicita D...
Long running High Performance Computing (HPC) applications at scale must be able to tolerate inevitable faults if they are to harness current and future HPC systems. Message Passi...
This paper presents an approach for integrating fault-tolerance techniques into microprocessors by utilizing instruction redundancy as well as time redundancy. Smaller and smaller...
In this paper we propose a fault-tolerant processor architecture and an associated fault-tolerant computation model capable of fault tolerance in the nanoelectronic environment th...
Hypervisor-based fault tolerance (HBFT), a checkpoint-recovery mechanism, is an emerging approach to sustaining mission-critical applications. Based on virtualization technology, H...
Jun Zhu, Wei Dong, Zhefu Jiang, Xiaogang Shi, Zhen...