Sciweavers

DSN
2007
IEEE

Using Process-Level Redundancy to Exploit Multiple Cores for Transient Fault Tolerance

14 years 7 months ago
Using Process-Level Redundancy to Exploit Multiple Cores for Transient Fault Tolerance
Transient faults are emerging as a critical concern in the reliability of general-purpose microprocessors. As architectural trends point towards multi-threaded multi-core designs, there is substantial interest in adapting such parallel hardware resources for transient fault tolerance. This paper proposes a software-based multi-core alternative for transient fault tolerance using process-level redundancy (PLR). PLR creates a set of redundant processes per application process and systematically compares the processes to guarantee correct execution. Redundancy at the process level allows the operating system to freely schedule the processes across all available hardware resources. PLR’s softwarecentric approach to transient fault tolerance shifts the focus from ensuring correct hardware execution to ensuring correct software execution. As a result, PLR ignores many benign faults that do not propagate to affect program correctness. A real PLR prototype for running single-threaded applic...
Alex Shye, Tipp Moseley, Vijay Janapa Reddi, Josep
Added 02 Jun 2010
Updated 02 Jun 2010
Type Conference
Year 2007
Where DSN
Authors Alex Shye, Tipp Moseley, Vijay Janapa Reddi, Joseph Blomstedt, Daniel A. Connors
Comments (0)