There has been much work recently on improving the locality performance of loop nests in scientific programs through the use of loop as well as data layout optimizations. However, little attention has been paid to the problem of optimizing locality in whole programs, particularly in the presence of procedures. Current techniques do not propagate layout optimizations across procedures boundaries; this is critical for realistic scientific codes, since the cost of explicitly transforming memory layouts across procedure boundaries might be very high. In this paper we present a locality optimization framework that uses both loop and data transformations to improve cache locality program-wide. Our framework propagates layout (or locality) constraints as a system of equalities across procedures and involves two traversals in the call graph representation of the program. Preliminary experimental results obtained on an R10000 based system demonstrate the power of the framework.
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja