Sparse matrix operations achieve only small fractions of peak CPU speeds because of the use of specialized, indexbased matrix representations, which degrade cache utilization by imposing irregular memory accesses and increasing the number of overall accesses. Compounding the problem, the small number of floating-point operations in a single sparse iteration leads to low floating-point pipeline utilization. Operation Stacking addresses these problems for large ensemble computations that solve multiple systems of linear equations with identical sparsity structure. By combining the data of multiple problems and solving them as one, operation stacking improves locality, reduces cache misses, and increases floating-point pipeline utilization. Operation stacking also requires less memory bandwidth because it involves fewer index array accesses. This paper presents the Operation Stacking Framework (OSF), an object-oriented framework that provides runtime and code generation support for th...
Mehmet Belgin, Godmar Back, Calvin J. Ribbens