Previous research has used program transformation to introduce parallelism and to exploit data locality. Unfortunately,these twoobjectives have usuallybeen considered independently. This work explores the tradeo s between e ectively utilizing parallelism and memory hierarchy on shared-memory multiprocessors. We present a simple, but surprisingly accurate, memory model to determine cache line reuse from both multiple accesses to the same memory location and from consecutive memoryaccess. The model is used in memory optimizingand loop parallelization algorithmsthat e ectively exploit data locality and parallelism in concert. We demonstrate the e cacy ofthis approach with very encouraging experimental results.
Ken Kennedy, Kathryn S. McKinley