Device and interconnect fabrics at the nanoscale will have a density of defects and susceptibility to transient faults far exceeding those of current silicon technologies. In this paper we introduce a new performance optimization dimension at the microarchitecture level which can mitigate overheads introduced by fault tolerance. This is achieved by directly exposing reliability versus delay design trade-offs while incorporating novel forms of speculation which use faster but less reliable versions of a microarchitecture's performance critical components. Based on a parameterized microarchitecture, we exhibit the benefits of optimizing these tradeoffs. Categories and Subject Descriptors: C.1 [Computer Systems Organization] Processor Architectures, Performance of Systems, B.8 [Hardware] Performance and Reliability General Terms: Performance, Design, Reliability.
Andrey V. Zykov, Elias Mizan, Margarida F. Jacome,