As the issue widthof superscalar processors is increased, instructionfetch bandwidthrequirements will also increase. It will become necessary to fetch multiple basic blocks per cycle. Conventional instructioncaches hinder this effort because long instruction sequences are not always in contiguous cache locations. We propose supplementing the conventional instruction cache witha trace cache. This structure caches traces of the dynamic instruction stream, so instructions that are otherwise noncontiguous appear contiguous. For the Instruction Benchmark Suite (IBS) and SPEC92 integer benchmarks, a 4 kilobyte trace cache improves performance on average by 28% over conventional sequential fetching. Further, it is shown that the trace cache's efficient, low latency approach enables it to outperform more complex mechanisms that work solely out of the instruction cache.
Eric Rotenberg, Steve Bennett, James E. Smith