This work identifies practical compiling techniques for scalable shared memory machines. For this, we have focused on experimental studies using a real machine and representative ...
Yunheung Paek, Angeles G. Navarro, Emilio L. Zapat...
Determination of data dependences is a task typically performed with high-level language source code in today's optimizing and parallelizing compilers. Very little work has b...
Wolfram Amme, Peter Braun, Eberhard Zehendner, Fra...
In this paper, we present an efficient algorithm, called CASS-II, for task clustering without task duplication. Unlike the DSC algorithm, which is empirically the best known algor...
Global locality analysis is a technique for improving the cache performance of a sequence of loop nests through a combination of loop and data layout optimizations. Pure loop tran...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...
In this paper, we present efficient methods for multidimensional array redistribution. Based on the previous work, the basic-cycle calculation technique, we present a basic-block ...