: In scalable parallel machines, processors can make local memory accesses much faster than they can make remote memory accesses. In addition, when a number of remote accesses must...
Abstract. The register allocation in loops is generally performed after or during the software pipelining process. This is because doing a conventional register allocation at firs...
The accurate modeling of the electronic structure of atoms and molecules involves computationally intensive tensor contractions involving large multi-dimensional arrays. The effi...
Daniel Cociorva, Xiaoyang Gao, Sandhya Krishnan, G...
Efficient partitioning of parallel loops plays a critical role in high performance and efficient use of multiprocessor systems. Although a significant amount of work has been don...
Arun Kejariwal, Alexandru Nicolau, Utpal Banerjee,...
The polyhedral model is extensively used for analyses and transformations of regular loop programs, one of the most important being automatic parallelization. The model, however, ...