Compute unified device architecture (CUDA) is a software development platform that allows us to run C-like programs on the nVIDIA graphics processing unit (GPU). This paper prese...
Previous studies have shown that array regrouping and structure splitting significantly improve data locality. The most effective technique relies on profiling every access to eve...
Multimedia vector instruction sets are becoming ubiquitous in most of the embedded systems used for multimedia, networking and communications. However, current compiler technology...
This paper proposes a low-power high-throughput digital signal processor (DSP) for baseband processing in wireless terminals. It builds on our earlier architecture--Signal processi...
Hyunseok Lee, Chaitali Chakrabarti, Trevor N. Mudg...
Abstract. LAPACK90 is a set of LAPACK90 subroutines which interfaces FORTRAN90 with LAPACK. All LAPACK driver subroutines including expert drivers and some LAPACK computationals ha...