Sciweavers

52 search results - page 10 / 11
» Loop Alignment for Memory Accesses Optimization
Sort
View
PC
2010
190views Management» more  PC 2010»
13 years 6 months ago
High-performance cone beam reconstruction using CUDA compatible GPUs
Compute unified device architecture (CUDA) is a software development platform that allows us to run C-like programs on the nVIDIA graphics processing unit (GPU). This paper prese...
Yusuke Okitsu, Fumihiko Ino, Kenichi Hagihara
ICS
2005
Tsinghua U.
14 years 1 months ago
Lightweight reference affinity analysis
Previous studies have shown that array regrouping and structure splitting significantly improve data locality. The most effective technique relies on profiling every access to eve...
Xipeng Shen, Yaoqing Gao, Chen Ding, Roch Archamba...
CODES
2005
IEEE
14 years 1 months ago
Improving superword level parallelism support in modern compilers
Multimedia vector instruction sets are becoming ubiquitous in most of the embedded systems used for multimedia, networking and communications. However, current compiler technology...
Christian Tenllado, Luis Piñuel, Manuel Pri...
TVLSI
2010
13 years 2 months ago
A Low-Power DSP for Wireless Communications
This paper proposes a low-power high-throughput digital signal processor (DSP) for baseband processing in wireless terminals. It builds on our earlier architecture--Signal processi...
Hyunseok Lee, Chaitali Chakrabarti, Trevor N. Mudg...
IPPS
1998
IEEE
13 years 12 months ago
High Performance Linear Algebra Package LAPACK90
Abstract. LAPACK90 is a set of LAPACK90 subroutines which interfaces FORTRAN90 with LAPACK. All LAPACK driver subroutines including expert drivers and some LAPACK computationals ha...
Jack Dongarra, Jerzy Wasniewski