We examine two pipeline structures which are employed in commercial microprocessors. The first is the load-use interlock (LUI) pipeline, which employs an interlock to ensure correct operation during load-use hazards. The second is the address-generation interlock (AGI) pipeline. It eliminates the load-use hazard, but has an address-generation hazard which requires an address-generation interlock for correct operation. We compare the performance of these two pipelines on existing binaries and on applications which have been recompiled with a local code scheduler that understands the difference in the pipeline structures. When branch prediction is more than 80% accurate and the data cache access time is greater than two cycles, the AGI pipeline performs significantly better than the LUI pipeline on existing binaries. Recompiling the benchmarks with a new local code scheduler provides little additional performance improvement.
Michael Golden, Trevor N. Mudge