In superscalar architectures, out-of-order issue mechanisms increase performance by dynamically rescheduling instructions that cannot be statically reordered by the compiler. While such mechanisms are effective, they are also expensive in terms of both complexity and silicon area. There is a need for cost-effective alternatives when area efficiency becomes a concern, such as when multiple processors are placed on a single die. In this paper we present Delayed Issue, a novel technique which allows instructions to be executed out-of-order without the hardware complexity of dynamic out-of-order issue. Instructions are inserted into per-functional unit delay queues using delays specified by the compiler. Instructions within a queue are issued in order; out of order execution results from different instructions being inserted into the queues at various delays. In addition to improving performance, delayed issue reduces code bloat when loops are pipelined. The goal of this paper is to explo...
J. P. Grossman