A microprocessor's performance is fundamentally limited by the rate at which it can resolve branch mispredictions. Control independence (CI) architectures look for useful con...
Kshitiz Malik, Mayank Agarwal, Sam S. Stone, Kevin...
In this paper we describe a GPU parallelization of the 3D finite difference computation using CUDA. Data access redundancy is used as the metric to determine the optimal implement...
The data distribution problem is very complex, because it involves trade-offdecisions between minimizing communication and maximizing parallelism. A common approach towards solving...
There has been much work recently on improving the locality performance of loop nests in scientific programs through the use of loop as well as data layout optimizations. However,...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...
Mixin modules are proposed as an extension of a class-based programming language. Mixin modules combine parallel extension of classes, including extension of the self types for th...