As microprocessor speeds increase, memory bandwidth is increasing y the performance bottleneck for microprocessors. This has occurred because innovation and technological improvements in processor design have outpaced advances in memory design. Most attempts at addressing this problem have involved hardware solutions. Unfortunately, these solutions do little to help the situation with respect to current microprocessors. In previous work, we developed, implemented, and evaluated an algorithm that exploited the ability of newer machines with wide-buses to load/ store multiple floating-point operands in a single memory reference. This paper describes a general code improvement rdgorithm that transforms code to better exploit the available memory bandwidth on existing microprocessors as well as widebus machines. Where possible and advantageous, the algorithm coalesces narrow memory references into wide ones. An interesting characteristic of the algorithm is that some decisions about the a...
Jack W. Davidson, Sanjay Jinturkar