A Note on Auto-tuning GEMM for GPUs

16 years 1 months ago

Download www.netlib.org

The development of high performance dense linear algebra (DLA) critically depends on highly optimized BLAS, and especially on the matrix multiplication routine (GEMM). This is especially true for Graphics Processing Units (GPUs), as evidenced by recently published results on DLA for GPUs that rely on highly optimized GEMM [13, 11]. However, the current best GEMM performance, e.g. of up to 375 GFlop/s in single precision and of up to 75 GFlop/s in double precision arithmetic on NVIDIA’s GTX 280, is diﬃcult to achieve. The development involves extensive GPU knowledge and even backward engineering to understand some undocumented insides about the architecture that have been of key importance in the development [12]. In this paper, we describe some GPU GEMM auto-tuning optimization techniques that allow us to keep up with changing hardware by rapidly reusing, rather than reinventing, the existing ideas. Auto-tuning, as we show in this paper, is a very practical solution where in additi...

Yinan Li, Jack Dongarra, Stanimire Tomov

Real-time Traffic

Applied Computing | Double Precision | GEMM | GTX 280 | ICCS 2009 |

claim paper

Post Info
More Details (n/a)

Added	26 May 2010
Updated	26 May 2010
Type	Conference
Year	2009
Where	ICCS
Authors	Yinan Li, Jack Dongarra, Stanimire Tomov

Comments (0)

Sciweavers

A Note on Auto-tuning GEMM for GPUs

Applied Computing | Double Precision | GEMM | GTX 280 | ICCS 2009 |

Explore & Download

Productivity Tools

Sciweavers