Loop distribution is an integral part of transforming a sequential program into a parallel one. It is used extensively in parallelization,vectorization, and memory management. For...
Automatic library generators, such as ATLAS [11], Spiral [8] and FFTW [2], are promising technologies to generate efficient code for different computer architectures. The library...
Daniel Orozco, Liping Xue, Murat Bolat, Xiaoming L...
It is widely known that parallel operation execution in multiprocessor systems generates a respective increase in memory accesses. Since the memory and bus subsystems provide a li...
Grigoris Dimitroulakos, Michalis D. Galanis, Costa...
GPGPUs have recently emerged as powerful vehicles for generalpurpose high-performance computing. Although a new Compute Unified Device Architecture (CUDA) programming model from N...
In this paper, we describe an algorithm and implementation of locality optimizations for architectures with instruction sets such as Intel’s SSE and Motorola’s AltiVec that su...