We have developed Argus, a novel approach for providing low-cost, comprehensive error detection for simple cores. The key to Argus is that the operation of a von Neumann core consists of four fundamental tasks—control flow, dataflow, computation, and memory access—that can be checked separately. We prove that Argus can detect any error by observing whether any of these tasks are performed incorrectly. We describe a prototype implementation, Argus-1, based on a single-issue, 4-stage, in-order processor to illustrate the potential of our approach. Experiments show that Argus-1 detects transient and permanent errors in simple cores with much lower impact on performance (<4% average overhead) and chip area (<17% overhead) than previous techniques.
Albert Meixner, Michael E. Bauer, Daniel J. Sorin