When cache blocking of sparse matrix vector multiply works and why

15 years 5 months ago

Download www.eecs.berkeley.edu

Abstract. We present new performance models and a new, more compact data structure for cache blocking when applied to the sparse matrixvector multiply (SpM×V) operation, y ← y + A · x. Prior work indicates that cache blocked SpM×V performs very well for some matrix and machine combinations, yielding speedups as high as 3x. We look at the general question of when and why performance improves, ﬁnding that cache blocking is most eﬀective when simultaneously 1) x does not ﬁt in cache, 2) y ﬁts in cache, 3) the non-zeros are distributed throughout the matrix, and 4) the non-zero density is suﬃciently high. We extend our prior performance models, which bounded performance by assuming x and y ﬁt in cache, to consider these classes of matrices. Unlike our prior model, the updated models are accurate enough to use as a heuristic for predicting the optimum block sizes. We conclude with architectural suggestions that would make processor and memory systems more amenable to SpM×V...

Rajesh Nishtala, Richard W. Vuduc, James Demmel, K

Real-time Traffic

AAECC 2007 | Algorithms | Cache | Compact Data Structure | Performance Models |

claim paper

» Fast Sparse MatrixVector Multiplication by Exploiting Variable Block Structure

» Sparse Signal Recovery with Temporally Correlated Source Vectors Using Sparse Bayesian Lea...

» Exploring the effect of block shapes on the performance of sparse kernels

» Sparse MatrixVector multiplication on FPGAs

» The potential of the cell processor for scientific computing

Post Info
More Details (n/a)

Added	08 Dec 2010
Updated	08 Dec 2010
Type	Journal
Year	2007
Where	AAECC
Authors	Rajesh Nishtala, Richard W. Vuduc, James Demmel, Katherine A. Yelick

Comments (0)

Sciweavers

When cache blocking of sparse matrix vector multiply works and why

AAECC 2007 | Algorithms | Cache | Compact Data Structure | Performance Models |

Explore & Download

Productivity Tools

Sciweavers