Current superscalar processors, both RISC and CISC, require substantial instruction fetch and decode bandwidth to keep multiple functional units utilized. While CISC instructions ...
Instruction Level Parallelism (ILP) speedups of an order-of-magnitude or greater may be possible using the techniques described herein. Traditional speculative code execution is t...
Untolerated load instruction latencies often have a significant impact on overall program performance. As one means of mitigating this effect, we present an aggressive hardware-b...