Sciweavers

ACMMSP
2006
ACM

Seven at one stroke: results from a cache-oblivious paradigm for scalable matrix algorithms

14 years 6 months ago
Seven at one stroke: results from a cache-oblivious paradigm for scalable matrix algorithms
A blossoming paradigm for block-recursive matrix algorithms is presented that, at once, attains excellent performance measured by • time, • TLB misses, • L1 misses, • L2 misses, • paging to disk, • scaling on distributed processors, and • portability to multiple platforms. It provides a philosophy and tools that allow the programmer to deal with the memory hierarchy invisibly, from L1 and L2 to TLB, paging, and interprocessor communication. Used together, they provide a cacheoblivious style of programming. Plots are presented to support these claims on an implementation of Cholesky factorization crafted directly from the paradigm in C with a few intrinsic calls. The results in this paper focus on low-level performance, including the new Morton-hybrid representation to take advantage of hardware and compiler optimizations. In particular, this code beats Intel’s Matrix Kernel Library and matches AMD’s Core Math Library, losing a bit on L1 misses while winning decisivel...
Michael D. Adams, David S. Wise
Added 13 Jun 2010
Updated 13 Jun 2010
Type Conference
Year 2006
Where ACMMSP
Authors Michael D. Adams, David S. Wise
Comments (0)