Sciweavers

118 search results - page 14 / 24
» Communication and Memory Optimal Parallel Data Cube Construc...
Sort
View
HPCA
2007
IEEE
14 years 1 months ago
An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing
Shared memory multiprocessors play an increasingly important role in enterprise and scientific computing facilities. Remote misses limit the performance of shared memory applicat...
Liqun Cheng, John B. Carter, Donglai Dai
IPPS
1998
IEEE
13 years 11 months ago
Locality Optimization for Program Instances
Abstract. The degree of locality of a program re ects the level of temporal and spatial concentration of related data and computations. Locality optimization can speed up programs ...
Claudia Leopold
EUROPAR
2001
Springer
13 years 11 months ago
Performance of High-Accuracy PDE Solvers on a Self-Optimizing NUMA Architecture
High-accuracy PDE solvers use multi-dimensional fast Fourier transforms. The FFTs exhibits a static and structured memory access pattern which results in a large amount of communic...
Sverker Holmgren, Dan Wallin
PDP
2010
IEEE
14 years 1 days ago
A Parallel Preconditioned Conjugate Gradient Solver for the Poisson Problem on a Multi-GPU Platform
- We present a parallel conjugate gradient solver for the Poisson problem optimized for multi-GPU platforms. Our approach includes a novel heuristic Poisson preconditioner well sui...
Marco Ament, Günter Knittel, Daniel Weiskopf,...
JSA
2000
115views more  JSA 2000»
13 years 6 months ago
Scheduling optimization through iterative refinement
Scheduling DAGs with communication times is the theoretical basis for achieving efficient parallelism on distributed memory systems. We generalize Graham's task-level in a ma...
Mayez A. Al-Mouhamed, Adel Al-Massarani