In this paper, we focus on reliability, one of the most fundamental and important challenges, in the nanoelectronics environment. For a processor architecture based on the unreliab...
To be able to fully exploit ever larger computing platforms, modern HPC applications and system software must be able to tolerate inevitable faults. Historically, MPI implementati...
Joshua Hursey, Jeffrey M. Squyres, Timothy Mattox,...
A long-term trend in high-performance computing is the increasing number of nodes in parallel computing platforms, which entails a higher failure probability. Fault tolerant progr...
One of the mostsoughtaftersoftware innovation of thisdecade is the construction of systems using off-the-shelf workstations that actually deliver, and even surpass, the power and ...
Modern distributed industrial control systems need improvements in their dependability. In this paper we study the dependability of a fault tolerant distributed industrial control ...