Large instruction window processors achieve high performance by exposing large amounts of instruction level parallelism. However, accessing large hardware structures typically required to buffer and process such instruction window sizes significantly degrade the cycle time. This paper proposes a novel Checkpoint Processing and Recovery (CPR) microarchitecture, and shows how to implement a large instruction window processor without requiring large structures thus permitting a high clock frequency. We focus on four critical aspects of a microarchitecture: 1) scheduling instructions, 2) recovering from branch mispredicts, 3) buffering a large number of stores and forwarding data from stores to any dependent load, and 4) reclaiming physical registers. While scheduling window size is important, we show the performance of large instruction windows to be more sensitive to the other three design issues. Our CPR proposal incorporates novel microarchitectural schemes for addressing these desig...
Haitham Akkary, Ravi Rajwar, Srikanth T. Srinivasa