Creating a high throughput sparse matrix vector multiplication (SpMxV) implementation depends on a balanced system design. In this paper, we introduce the innovative SpMxV Solver designed for FPGAs (SSF). Besides high computational throughput, system performance is optimized by reducing initialization time and overheads, minimizing and overlapping I/O operations, and increasing scalability. SSF accepts any matrix size and can be easily adapted to different data formats. SSF minimizes the control logic by taking advantage of the data flow via an innovative accumulation circuit which uses pipelined floating point adders. Compared to optimized software codes on a Pentium 4 microprocessor, our design achieves up to 20x speedup. Keywords FPGA, Sparse matrix multiplication, Performance
Junqing Sun, Gregory D. Peterson, Olaf O. Storaasl