The importance of fault tolerance at the processor architecture level has been made increasingly important due to rapid advancements in the design and usage of high performance de...
T. S. Ganesh, Viswanathan Subramanian, Arun K. Som...
In this paper we introduce Resizable Data Composer-Cache (RDC-Cache). This novel cache architecture operates correctly at sub 500 mV in 65 nm technology tolerating large number of...
Avesta Sasan, Houman Homayoun, Ahmed M. Eltawil, F...
— Large Clusters, high availability clusters and Grid deployments often suffer from network, node or operating system faults and thus require the use of fault tolerant programmin...
Commodity computer clusters are often composed of hundreds of computing nodes. These generally off-the-shelf systems are not designed for high reliability. Node failures therefore...
As the number of processors in today’s high performance computers continues to grow, the mean-time-to-failure of these computers are becoming significantly shorter than the exe...
Zizhong Chen, Graham E. Fagg, Edgar Gabriel, Julie...