The growing disparity between processor and memory speeds has caused memory bandwidth to become the performance bottleneck for many applications. In particular, this performance gap severely impacts streamorientated computations such as (de)compression, encryption, and scientific vector processing. This paper describes the development of an intelligent memory interface that can exploit compiler-provided information on streamed memory access patterns to improve memory bandwidth. Simulation results show that such sharedmemory multiprocessor systems can deliver nearly the full attainable bandwidth with relatively modest hardware costs.
Sally A. McKee, William A. Wulf