This paper presents a new partitioning algorithm to perform matrix multiplication on two interconnected heterogeneous processors. Data is partitioned in a way which minimizes the ...
Multi-context FPGAs have multiple memory bits per configuration bit forming configuration planes for fast switching between contexts. Large amount of memory causes significant ove...
We present a new parallel algorithm to compute an exact triangularization of large square or rectangular and dense or sparse matrices in any field. Using fast matrix multiplicatio...
Previous research has used program transformation to introduce parallelism and to exploit data locality. Unfortunately,these twoobjectives have usuallybeen considered independentl...
Sparse matrix-vector multiplication forms the heart of iterative linear solvers used widely in scientific computations (e.g., finite element methods). In such solvers, the matrix-v...