Sciweavers

33 search results - page 4 / 7
» A memory optimization technique for software-managed scratch...
Sort
View
PC
2010
190views Management» more  PC 2010»
13 years 5 months ago
High-performance cone beam reconstruction using CUDA compatible GPUs
Compute unified device architecture (CUDA) is a software development platform that allows us to run C-like programs on the nVIDIA graphics processing unit (GPU). This paper prese...
Yusuke Okitsu, Fumihiko Ino, Kenichi Hagihara
CC
2008
Springer
144views System Software» more  CC 2008»
13 years 9 months ago
Control Flow Emulation on Tiled SIMD Architectures
Heterogeneous multi-core and streaming architectures such as the GPU, Cell, ClearSpeed, and Imagine processors have better power/ performance ratios and memory bandwidth than tradi...
Ghulam Lashari, Ondrej Lhoták, Michael McCo...
IPPS
2010
IEEE
13 years 5 months ago
Optimization of linked list prefix computations on multithreaded GPUs using CUDA
We present a number of optimization techniques to compute prefix sums on linked lists and implement them on multithreaded GPUs using CUDA. Prefix computations on linked structures ...
Zheng Wei, Joseph JáJá
CCGRID
2011
IEEE
12 years 11 months ago
Small Discrete Fourier Transforms on GPUs
– Efficient implementations of the Discrete Fourier Transform (DFT) for GPUs provide good performance with large data sizes, but are not competitive with CPU code for small data ...
S. Mitra, A. Srinivasan
ICS
2009
Tsinghua U.
14 years 2 months ago
Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs
Iterative stencil loops (ISLs) are used in many applications and tiling is a well-known technique to localize their computation. When ISLs are tiled across a parallel architecture...
Jiayuan Meng, Kevin Skadron