While clusters of commodity servers and switches are the most popular form of large-scale parallel computers, many programs are not easily parallelized for execution upon them. In...
Hanjun Kim, Arun Raman, Feng Liu, Jae W. Lee, Davi...
Abstract. We improve the performance of sparse matrix-vector multiplication (SpMV) on modern cache-based superscalar machines when the matrix structure consists of multiple, irregu...
A style for programming problems from matrix algebra is developed with a familiar example and new tools, yielding high performance with a couple of surprising exceptions. The under...
David S. Wise, Craig Citro, Joshua Hursey, Fang Li...
Sparse matrix problems are di cult to parallelize e ciently on message-passing machines, since they access data through multiple levels of indirection. Inspector executor strategie...
Manuel Ujaldon, Shamik D. Sharma, Joel H. Saltz, E...
Due to power wall, memory wall, and ILP wall, we are facing the end of ever increasing single-threaded performance. For this reason, multicore and manycore processors are arising ...