Memory systems for conventional large-scale computers provide only limited bytes/s of data bandwidth when compared to their flop/s of instruction execution rate. The resulting bo...
Robert J. Drost, Craig Forrest, Bruce Guenin, Ron ...
The growing disparity between processor and memory speeds has caused memory bandwidth to become the performance bottleneck for many applications. In particular, this performance ga...
The paper presents eficient scalable algorithms for performing Prefix (PC) and General Prefix (GPC) Computations on a Distributed Shared Memory ( D S M ) system with applications....
This paper focuses on the Cyclops64 computer architecture and presents an analytical model and performance simulation results for the preloading and loop unrolling approaches to op...
Yanwei Niu, Ziang Hu, Kenneth E. Barner, Guang R. ...
In this work we investigate the impact of evolving memory system features, such as large on-chip caches, automatic prefetch, and the growing distance to main memory on 3D stencil ...
Shoaib Kamil, Parry Husbands, Leonid Oliker, John ...