Soft errors are a growing concern for processor reliability. Recent work has motivated architecture-level studies of soft errors since the architecture can mask many raw errors and architectural solutions can exploit workload knowledge. This paper proposes a model and tool, called SoftArch, to enable analysis of soft errors at the architecturelevel in modern processors. SoftArch is based on a probabilistic model of the error generation and propagation process in a processor. Compared to prior architecture-level tools, SoftArch is more comprehensive or faster. We demonstrate the use of SoftArch for an out-of-order superscalar processor running SPEC2000 benchmarks. Our results are consistent with, but more comprehensive than, prior work, and motivate selective and dynamic architecture-level soft error protection mechanisms.
Xiaodong Li, Sarita V. Adve, Pradip Bose, Jude A.