—Sparse Matrix-Vector multiplication (SpMV) is a very challenging computational kernel, since its performance depends greatly on both the input matrix and the underlying architecture. The main problem of SpMV is its high demands on memory bandwidth, which cannot yet be abudantly offered from modern commodity architectures. One of the most promising optimization techniques for SpMV is blocking, which can reduce the indexing structures for storing a sparse matrix, and therefore alleviate the pressure to the memory subsystem. However, blocking methods can severely degrade performance if not used properly. In this paper, we study and evaluate a number of representative blocking storage formats and present a performance model that can accurately select the most suitable blocking storage format and the corresponding block shape and size for a specific sparse matrix. Our model considers both the memory and computational part of the kernel, which can be non-negligible when applying blocking...
Vasileios Karakasis, Georgios I. Goumas, Nectarios