Ensuring back-to-back execution of dependent instructions in a conventional out-of-order processor requires scheduling logic that wakes up and selects instructions at the same rate as they are executed. To sustain high performance, integer ALU instructions typically have singlecycle latency, consequently requiring scheduling logic with the same single-cycle latency. Prior proposals have advocated the use of speculation in either the wakeup or select phases to enable pipelining of scheduling logic to achieve higher clock frequency. In contrast, this paper proposes macro-op scheduling, which systematically removes instructions with single-cycle latency from the machine by combining them into macro-ops, and performs nonspeculative pipelined scheduling of multi-cycle operations. Macroop scheduling also increases the effective size of the scheduling window by enabling multiple instructions to occupy a single issue queue entry. We demonstrate that pipelined 2cycle macro-op scheduling perfor...
Ilhyun Kim, Mikko H. Lipasti