Automating the generation of composed linear algebra kernels

14 years 9 months ago

Download ecee.colorado.edu

Memory bandwidth limits the performance of important kernels in many scientiﬁc applications. Such applications often use sequences of Basic Linear Algebra Subprograms (BLAS), and highly efﬁcient implementations of those routines enable scientists to achieve high performance at little cost. However, tuning the BLAS in isolation misses opportunities for memory optimization that result from composing multiple subprograms. Because it is not practical to create a library of all BLAS combinations, we have developed a domain-speciﬁc compiler that generates them on demand. In this paper, we describe a novel algorithm for compiling linear algebra kernels and searching for the best combination of optimization choices. We also present a new hybrid analytic/empirical method for quickly evaluating the proﬁtability of each optimization. We report experimental results showing speedups of up to 130% relative to the GotoBLAS on an AMD Opteron and up to 137% relative to MKL on an Intel Core 2.

Geoffrey Belter, Elizabeth R. Jessup, Ian Karlin,

Real-time Traffic

Applied Computing | Basic Linear Algebra Subprograms | Linear Algebra | Linear Algebra Kernels | SC 2009 |

claim paper

Post Info
More Details (n/a)

Added	19 May 2010
Updated	19 May 2010
Type	Conference
Year	2009
Where	SC
Authors	Geoffrey Belter, Elizabeth R. Jessup, Ian Karlin, Jeremy G. Siek

Comments (0)

Sciweavers

Automating the generation of composed linear algebra kernels

Applied Computing | Basic Linear Algebra Subprograms | Linear Algebra | Linear Algebra Kernels | SC 2009 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers