Untolerated load instruction latencies often have a significant impact on overall program performance. As one means of mitigating this effect, we present an aggressive hardware-b...
State of the art microprocessors achieve high performance by executing multiple instructions per cycle. In an out-oforder engine, the instruction scheduler is responsible for disp...
Concurrent multithreaded architectures exploit both instruction-level and thread-level parallelism through a combination of branch prediction and thread-level control speculation. ...
The load instructions of some of the bioinformatics applications in the BioPerf suite possess interesting characteristics: only a few static loads cover almost the entire dynamic ...