This paper introduces a novel technique which leverages value prediction and multithreading on a simultaneous multithreading processor to achieve higher performance in a single threaded application. By allowing the valuespeculative execution to proceed in a separate thread, this technique overcomes barriers that make traditional value prediction relatively ineffective for tolerating long latency loads. It shows that this technique can be as much as 2-5 times more effective than traditional value prediction, achieving more than 40% average performance gain on the SPEC benchmarks with realistic hardware parameters. These gains come from two effects: allowing greater separation between the stalled load and the speculative execution, and the ability to speculate on multiple values for a single load.
Nathan Tuck, Dean M. Tullsen