We develop a hierarchical matrix construction algorithm using matrixvector multiplications, based on the randomized singular value decomposition of low-rank matrices. The algorith...
– Modern multi-processor system-on-chip (MPSoC) designs have high bandwidth constraints which must be satisfied by the underlying communication architecture. Bus matrix based com...
Sudeep Pasricha, Nikil D. Dutt, Mohamed Ben-Romdha...
Many applications arising in a variety of fields can be well illustrated by the task of recovering the low-rank and sparse components of a given matrix. Recently, it is discovered...
—This paper describes a multi-threaded parallel design and implementation of the Smith-Waterman (SM) algorithm on compute unified device architecture (CUDA)-compatible graphic pr...
– Efficient implementations of the Discrete Fourier Transform (DFT) for GPUs provide good performance with large data sizes, but are not competitive with CPU code for small data ...