Sciweavers

753 search results - page 17 / 151
» Mechanisms for Mapping High-Level Parallel Performance Data
Sort
View
110
Voted
HIPS
1998
IEEE
15 years 6 months ago
Implementing Automatic Coordination on Networks of Workstations
Distributed shared objects are a well known approach to achieve independenceof the memory model for parallel programming. The illusion of shared (global) objects is a conabstracti...
Christian Weiß, Jürgen Knopp, Hermann H...
ASAP
1996
IEEE
145views Hardware» more  ASAP 1996»
15 years 6 months ago
A Synthesis System For Bus-Based Wavefront Array Architectures
A datapath synthesis system (DPSS) for a bus-based wavefront array architecture, called rDPA (reconfigurable datapath architecture), is presented. An internal data bus to the arra...
Reiner W. Hartenstein, Jürgen Becker, Michael...
134
Voted
ASPLOS
2009
ACM
15 years 9 months ago
Performance analysis of accelerated image registration using GPGPU
This paper presents a performance analysis of an accelerated 2-D rigid image registration implementation that employs the Compute Unified Device Architecture (CUDA) programming e...
Peter Bui, Jay B. Brockman
128
Voted
IPPS
2000
IEEE
15 years 7 months ago
Dynamic Data Layouts for Cache-Conscious Factorization of DFT
Effective utilization of cache memories is a key factor in achieving high performance in computing the Discrete Fourier Transform (DFT). Most optimizationtechniques for computing ...
Neungsoo Park, Dongsoo Kang, Kiran Bondalapati, Vi...
139
Voted
PDP
2010
IEEE
15 years 8 months ago
A Parallel Preconditioned Conjugate Gradient Solver for the Poisson Problem on a Multi-GPU Platform
- We present a parallel conjugate gradient solver for the Poisson problem optimized for multi-GPU platforms. Our approach includes a novel heuristic Poisson preconditioner well sui...
Marco Ament, Günter Knittel, Daniel Weiskopf,...