Tiling exploits temporal reuse carried by an outer loop of a loop nest to enhance cache locality. Loop skewing is typically required to make tiling legal. This restricts parallelis...
Data locality and synchronization overhead are two important factors that affect the performance of applications on multiprocessors. Loop fusion is an effective way for reducing s...
Edwin Hsing-Mean Sha, Chenhua Lang, Nelson L. Pass...
In s-to-p broadcasting, s processors in a p-processor machine contain a message to be broadcast to all the processors, 1 s p. We present a number of different broadcasting algori...
The ability to automatically parallelize standard programming languages results in program portability across a wide range of machine architectures. It is the goal of the Polaris ...
William Blume, Rudolf Eigenmann, Keith Faigin, Joh...