As the number of processors in today’s high performance computers continues to grow, the mean-time-to-failure of these computers are becoming significantly shorter than the exe...
Zizhong Chen, Graham E. Fagg, Edgar Gabriel, Julie...
As the scale of cluster computing grows, it is becoming hard for long-running applications to complete without facing failures on large-scale clusters. To address this issue, chec...
In this paper, we present a study on the fault tolerance nature of the island model when applied to Genetic Algorithms. Parallel and distributed models have been extensively appli...
In this paper, a fault-tolerant routing in 2-D meshes with dynamic faults is provided. It is based on an early work on minimal routing in 2-D meshes with static faults. Unlike man...
Hypervisor-based fault tolerance (HBFT), a checkpoint-recovery mechanism, is an emerging approach to sustaining mission-critical applications. Based on virtualization technology, H...
Jun Zhu, Wei Dong, Zhefu Jiang, Xiaogang Shi, Zhen...