Runahead execution improves memory latency tolerance without significantly increasing processor complexity. Unfortunately, a runahead execution processor executes significantly more instructions than a conventional processor, sometimes without providing any performance benefit, which makes it inefficient. In this article, we identify the causes of inefficiency in runahead execution and propose simple -yet effective- techniques to make a runahead processor more efficient, thereby reducing its energy consumption. The proposed efficiency techniques reduce the extra instructions executed in a runahead processor from 26.5% to 6.2% without significantly affecting the 22% performance improvement provided by runahead execution.
Onur Mutlu, Hyesoon Kim, Yale N. Patt