Towards batched linear solvers on accelerated hardware platforms

10 years 3 months ago

Download www.netlib.org

As hardware evolves, an increasingly effective approach to develop energy efﬁcient, high-performance solvers, is to design them to work on many small and independent problems. Indeed, many applications already need this functionality, especially for GPUs, which are known to be currently about four to ﬁve times more energy efﬁcient than multicore CPUs for every ﬂoating-point operation. In this paper, we describe the development of the main one-sided factorizations: LU, QR, and Cholesky; that are needed for a set of small dense matrices to work in parallel. We refer to such algorithms as batched factorizations. Our approach is based on representing the algorithms as a sequence of batched BLAS routines for GPU-contained execution. Note that this is similar in functionality to the LAPACK and the hybrid MAGMA algorithms for large-matrix factorizations. But it is different from a straightforward approach, whereby each of GPU’s symmetric multiprocessors factorizes a single problem ...

Azzam Haidar, Tingxing Dong, Piotr Luszczek, Stani

Real-time Traffic

Distributed And Parallel Computing | PPOPP 2015 |

claim paper

Post Info
More Details (n/a)

Added	16 Apr 2016
Updated	16 Apr 2016
Type	Journal
Year	2015
Where	PPOPP
Authors	Azzam Haidar, Tingxing Dong, Piotr Luszczek, Stanimire Tomov, Jack J. Dongarra

Comments (0)

Sciweavers

Towards batched linear solvers on accelerated hardware platforms

Distributed And Parallel Computing | PPOPP 2015 |

Explore & Download

Productivity Tools

Sciweavers