We present a cache locality optimization technique that can optimize a loop nest even if the arrays referenced have different layouts in memory. Such a capability is required for a global locality optimization framework that applies both loop and data transformations to a sequence of loop nests for optimizing locality. Our method finds a nonsingular iteration-space transformation matrix such that in a given loop nest spatial locality is exploited in the innermost loops where it is most useful. The method builds inverse of a non-singular transformation matrix column-by-column starting from the rightmost column. In addition, our approach can work in those cases where the data layouts of a subset of the referenced arrays is unknown. Experimental results on an 8-processor SGI Origin 2000 show that our technique reduces execution times by up to 72%.
Mahmut T. Kandemir, J. Ramanujam, Alok N. Choudhar