Operand bypass logic might be one of the critical structures for future microprocessors to achieve high clock speed. The delay of the logic imposes the execution time budget to be ...
By optimizing data layout at run-time, we can potentially enhance the performance of caches by actively creating spatial locality, facilitating prefetching, and avoiding cache con...
Pipeline flushes due to branch mispredictions is one of the most serious problems facing the designer of a deeply pipelined, superscalar processor. Many branch predictors have bee...
Marius Evers, Sanjay J. Patel, Robert S. Chappell,...
Accurate branch prediction is essential for obtaining high performance in pipelined superscalar processors that execute instructions speculatively. Some of the best current predic...
This paper discusses three techniques useful in relaxing the constraints imposed by control flow on parallelism: control dependence analysis, executing multiple flows of control s...