Sciweavers

14 search results - page 1 / 3
» High-performance implementation of the level-3 BLAS
Sort
View
PPSC
1989
13 years 9 months ago
Evaluating Block Algorithm Variants in LAPACK
The LAPACK software project currently under development is intended to provide a portable linear algebra library for high performance computers. LAPACK will make use of the Level 1...
Ed Anderson, Jack Dongarra
TOMS
2008
53views more  TOMS 2008»
13 years 8 months ago
High-performance implementation of the level-3 BLAS
Kazushige Goto, Robert A. van de Geijn
PPOPP
2010
ACM
14 years 5 months ago
Scaling LAPACK panel operations using parallel cache assignment
In LAPACK many matrix operations are cast as block algorithms which iteratively process a panel using an unblocked algorithm and then update a remainder matrix using the high perf...
Anthony M. Castaldo, R. Clint Whaley
SC
2009
ACM
14 years 3 months ago
Automating the generation of composed linear algebra kernels
Memory bandwidth limits the performance of important kernels in many scientific applications. Such applications often use sequences of Basic Linear Algebra Subprograms (BLAS), an...
Geoffrey Belter, Elizabeth R. Jessup, Ian Karlin, ...
EUROPAR
2001
Springer
14 years 1 months ago
Parallel Implementation of a Block Algorithm for Matrix 1-Norm Estimation
Abstract. We describe a parallel Fortran 77 implementation, in ScaLAPACK style, of a block matrix 1-norm estimator of Higham and Tisseur. This estimator differs from that underlyi...
Sheung Hun Cheng, Nicholas J. Higham