While trace cache, value prediction, and prefetching have been shown to be effective in the single-threaded superscalar, there has been no analysis of these techniques in a Simulta...
As the issue widthof superscalar processors is increased, instructionfetch bandwidthrequirements will also increase. It will become necessary to fetch multiple basic blocks per cy...
The trace cache is a recently proposed solution to achieving high instruction fetch bandwidth by buffering and reusing dynamic instruction traces. This work presents a new block-b...
Trace cache, an instruction fetch technique that reduces taken branch penalties by storing and fetching program instructions in dynamic execution order, dramatically improves inst...
Trace caches enable high bandwidth, low latency instruction supply, but have a high miss penalty and relatively large working sets. Consequently, their performance may suffer due ...
This work examines dynamic cluster assignment for a clustered trace cache processor (CTCP). Previously proposed cluster assignment techniques run into unique problems as issue wid...
As superscalar processors become increasingly wide, it is inevitable that the large set of instructions to be fetched every cycle will span multiple noncontiguous basic blocks. Th...
This paper presents a new technique for efficient usage of small trace caches. A trace cache can significantly increase the performance of wide out-oforder processors, but to be e...
Fast instruction decoding is a challenge for the design of CISC microprocessors. A well-known solution to overcome this problem is using a trace cache. It stores and fetches alrea...