Sparse Matrix-Vector multiplication (SpMV) is a very challenging computational kernel, since its performance depends greatly on both the input matrix and the underlying architecture. The main problem of SpMV is its high demands on memory bandwidth, which cannot yet be abudantly offered from modern commodity architectures. One of the most promising optimization techniques for SpMV is blocking, which can reduce the indexing structures for storing a sparse matrix, and therefore alleviate the pressure to the memory subsystem. In this paper, we study and evaluate a number of representative blocking storage formats on a set of modern microarchitectures that can provide up to 64 hardware contexts. The purpose of this paper is to present the merits and drawbacks of each method in relation to the underlying microarchitecture and to provide a consistent overview of the most promising blocking storage methods for sparse matrices that have been presented in the literature. Keywords-sparse matrix-v...
Vasileios Karakasis, Georgios I. Goumas, Nectarios