Aggressive CMOS scaling will make future chip multiprocessors (CMPs) increasingly susceptible to transient faults, hard errors, manufacturing defects, and process variations. Exis...
Concentration of design effort for current single-chip Commercial-Off-The-Shelf (COTS) microprocessors has been directed towards performance. Reliability has not been the primary ...
Memory state can be corrupted by the impact of particles causing single-event upsets (SEUs). Understanding and dealing with these soft (or transient) errors is important for syste...
Much recent work on Byzantine state machine replication focuses on protocols with improved performance under benign conditions (LANs, homogeneous replicas, limited crash faults), ...
Atul Singh, Tathagata Das, Petros Maniatis, Peter ...
Transient faults that arise in large-scale software systems can often be repaired by re-executing the code in which they occur. Ascribing a meaningful semantics for safe re-execut...