Abstract. Computer simulations of realistic applications usually require solving a set of non-linear partial di erential equations PDEs over a nite region. The process of obtaining numerical solutions to the governing PDEs involves solving large sparse linear or eigen systems over the unstructured meshes that model the underlying physical objects. These systems are often solved iteratively, where the sparse matrix-vector multiply SPMV is the most expensive operation within each iteration. In this paper, we focus on the e ciency of SPMV using various ordering partitioning algorithms. We examine di erent implementations using three leading programming paradigms and architectures. Results show that ordering greatly improves performance, and that cache reuse can be more important than reducing communication. However, a multithreaded implementation indicates that ordering and partitioning are not required on the Tera MTA to obtain an e cient and scalable SPMV.
Leonid Oliker, Xiaoye S. Li, Gerd Heber, Rupak Bis