Trace caches enable high bandwidth, low latency instruction supply, but have a high miss penalty and relatively large working sets. Consequently, their performance may suffer due to capacity and compulsory misses. Trace preconstruction augments a trace cache by performing a function analogous to prefetching. The trace preconstruction mechanism observes the processor's instruction dispatch stream to detect opportunities for jumping ahead of the processor. After doing so, the preconstruction mechanism fetches static instructions from the predicted future region of the program, and constructs a set of traces in advance of when they are needed. Trace preconstruction can significantly increase both the performance of the trace cache and the robustness of the trace cache to varying workloads. All but one of the SPECint95 benchmarks see a notable reduction in trace cache miss rates from preconstruction. The three benchmarks that have the largest working set (gcc, go and vortex) see a 30...
Quinn Jacobson, James E. Smith