Path profiles record the frequencies of execution paths through a program. Until now, the best global instruction schedulers have relied upon profile-gathered frequencies of condi...
We present an architecture that features dynamic multithreading execution of a single program. Threads are created automatically by hardware at procedure and loop boundaries and e...
The fill unit is the structure which collects blocks of instructions and combines them into multi-block segments for storage in a trace cache. In this paper, we expand the role of...
Media applications are characterized by large amounts of available parallelism, little data reuse, and a high computation to memory access ratio. While these characteristics are p...
Scott Rixner, William J. Dally, Ujval J. Kapasi, B...
This paper evaluates several mechanisms for repairing the return-address stack after branch mispredictions. The return-address stack is a small but important structure for achievi...
Kevin Skadron, Pritpal S. Ahuja, Margaret Martonos...