Scalable and fault tolerant runtime environments are needed to support and adapt to the underlying libraries and hardware which require a high degree of scalability in dynamic larg...
Thara Angskun, Graham E. Fagg, George Bosilca, Jel...
Reliability has become a serious concern as systems embrace nanometer technologies. In this paper, we propose a novel approach for organizing redundancy that provides high degree ...
Due to modern technology trends such as decreasing feature sizes and lower voltage levels, fault tolerance is becoming increasingly important in computing systems. Shared memory i...
The incidence of hard errors in CPUs is a challenge for future multicore designs due to increasing total core area. Even if the location and nature of hard errors are known a prio...
Michael D. Powell, Arijit Biswas, Shantanu Gupta, ...
Information integrity in cache memories is a fundamental requirement for dependable computing. Conventional architectures for enhancing cache reliability using check codes make it...