This paper presents a quantification of the timing effects that advanced processor features like data and instruction cache, pipelines, branch prediction units and out-oforder ex...
Several ILP limit studies indicate the presence of considerable ILP across dynamically far-apart instructions in program execution. This paper proposes a hardware mechanism, dynam...
Modern processors improve instruction level parallelism by speculation. The outcome of data and control decisions is predicted, and the operations are speculatively executed and o...
Dirk Grunwald, Artur Klauser, Srilatha Manne, Andr...
Commercial applications are an important, yet often overlooked, workload with significantly different characteristics from technical workloads. The potential impact of these diffe...
Kimberly Keeton, David A. Patterson, Yong Qiang He...
In modern computer systems loops present a great deal of opportunities for increasing Instruction Level and Thread Level Parallelism. Loop unrolling is a technique used to obtain ...