Transient errors are one of the major reasons for system downtime in many systems. While prior research has mainly focused on the impact of transient errors on datapath, caches an...
Conventional error correcting code (ECC) schemes used in memories and caches cannot correct double bit errors caused by a single event upset (SEU). As memory density increases, mu...
Silent store instructions write values that exactly match the values that are already stored at the memory address that is being written. A recent study reveals that significant ...
The importance of fault tolerance at the processor architecture level has been made increasingly important due to rapid advancements in the design and usage of high performance de...
T. S. Ganesh, Viswanathan Subramanian, Arun K. Som...
—The number of CPUs in chip multiprocessors is growing at the Moore’s Law rate, due to continued technology advances. However, new technologies pose serious reliability challen...