Shared memory multiprocessors play an increasingly important role in enterprise and scientific computing facilities. Remote misses limit the performance of shared memory applicat...
Abstract. The degree of locality of a program re ects the level of temporal and spatial concentration of related data and computations. Locality optimization can speed up programs ...
High-accuracy PDE solvers use multi-dimensional fast Fourier transforms. The FFTs exhibits a static and structured memory access pattern which results in a large amount of communic...
- We present a parallel conjugate gradient solver for the Poisson problem optimized for multi-GPU platforms. Our approach includes a novel heuristic Poisson preconditioner well sui...
Scheduling DAGs with communication times is the theoretical basis for achieving efficient parallelism on distributed memory systems. We generalize Graham's task-level in a ma...