Despite years of study, branch mispredictions remain as a significant performance impediment in pipelined superscalar processors. In general, the branch misprediction penalty can be substantially larger than the frontend pipeline length (which is often equated with the misprediction penalty). We identify and quantify five contributors to the branch misprediction penalty: (i) the frontend pipeline length, (ii) the number of instructions since the last miss event (branch misprediction, I-cache miss, long D-cache miss)—this is related to the burstiness of miss events, (iii) the inherent ILP of the program, (iv) the functional unit latencies, and (v) the number of short (L1) D-cache misses. The characterizations done in this paper are driven by ‘interval analysis’, an analytical approach that models superscalar processor performance as a sequence of inter-miss intervals.
Stijn Eyerman, James E. Smith, Lieven Eeckhout