In this study, we investigate different cache fault tolerance techniques to determine which will be most effective when on-chip memory cell defect probabilities exceed those of current technologies, which is highly anticipated in processor on-chip caches manufactured with future nanometer scale technologies. Our most significant finding from this study is that the devices in on-chip memory cells cannot be scaled at the same rate as devices in logic circuits due to the increasing number of erroneous memory cells with voltage scaling, requiring strong fault-tolerance techniques. Second, we propose a technique to minimize performance impacts under aggressive technology and voltage scaling. It works by merging pairs of faulty cache lines to make good lines and performs better than TMR at high error rates and at lower cost. We also estimate up to 28% energy savings at low voltage, relative to a recent fault-tolerance scheme [1].
David Roberts, Nam Sung Kim, Trevor N. Mudge