Sciweavers

PPOPP
2010
ACM

Model-driven autotuning of sparse matrix-vector multiply on GPUs

14 years 10 months ago
Model-driven autotuning of sparse matrix-vector multiply on GPUs
We present a performance model-driven framework for automated performance tuning (autotuning) of sparse matrix-vector multiply (SpMV) on systems accelerated by graphics processing units (GPU). Our study consists of two parts. First, we describe several carefully hand-tuned SpMV implementations for GPUs, identifying key GPU-specific performance limitations, enhancements, and tuning opportunities. These implementations, which include variants on classical blocked compressed sparse row (BCSR) and blocked ELLPACK (BELLPACK) storage formats, match or exceed state-of-the-art implementations. For instance, our best BELLPACK implementation achieves up to 29.0 Gflop/s in single-precision and 15.7 Gflop/s in doubleprecision on the NVIDIA T10P multiprocessor (C1060), enhancing prior state-of-the-art unblocked implementations (Bell and
Jee W. Choi, Amik Singh, Richard W. Vuduc
Added 05 Mar 2010
Updated 08 Mar 2010
Type Conference
Year 2010
Where PPOPP
Authors Jee W. Choi, Amik Singh, Richard W. Vuduc
Comments (0)