The use of large instruction windows coupled with aggressive out-oforder and prefetching capabilities has provided significant improvements in processor performance. In this paper, we quantify the effects of increased out-of-order aggressiveness on a processor's memory ordering/consistency model as well as an application's cache behavior. We observe that increasing reorder buffer sizes cause less than one third of issued memory instructions to be executed in actual program order. We show that increasing the reorder buffer size from 80 to 512 entries results in an increase in the frequency of memory traps by a factor of six and an increase in total execution overhead by 10?40%. Additionally, we observe that the reordering of memory instructions increases the L1 data cache accesses by 10?60% and the L1 data cache misses by 10?20%. These findings reveal that increased out-of-order capability can waste energy in two ways. First, re-fetching and re-executing instructions flushed ...
Aamer Jaleel, Bruce L. Jacob