Even sophisticated branch-prediction techniques necessarily suffer some mispredictions, and even relatively small mispredict rates hurt performance substantially in current-generation processors. In this paper, we investigate schemes for improving performance in the face of imperfect branch predictors by having the processor simultaneously execute code from both the taken and not-taken outcomes of a branch. This paper presents data regarding the limits of multipath execution, considers fetch-bandwidth needs for multipath execution, and discusses various dynamic confidence-prediction schemes that gauge the likelihood of branch mispredictions. Our evaluations consider executing along several (2
Pritpal S. Ahuja, Kevin Skadron, Margaret Martonos