The probability that a failure will occur before the end of the computation increases as the number of processors used in a high performance computing application increases. For l...
There are a lot of applications that run on modern mobile operating systems. Inevitably, some of these applications fail in the hands of users. Diagnosing a failure to identify the...
Sharad Agarwal, Ratul Mahajan, Alice Zheng, Victor...
As the disks typically found in personal computers grow larger, protecting data by replicating it on a collection of “peer” systems rather than on dedicated high performance s...
Dmitry Brodsky, Michael J. Feeley, Norman C. Hutch...
In this paper we develop a recovery conscious framework for multi-core architectures and a suite of techniques for improving the resiliency and recovery efficiency of highly conc...
Sangeetha Seshadri, Lawrence Chiu, Cornel Constant...
— Extreme scaling practices in silicon technology are quickly leading to integrated circuit components with limited reliability, where phenomena such as early-transistor failures...
Andrea Pellegrini, Kypros Constantinides, Dan Zhan...