Sciweavers

SPAA
2010
ACM
13 years 9 months ago
Managing the complexity of lookahead for LU factorization with pivoting
We describe parallel implementations of LU factorization with pivoting for multicore architectures. Implementations that differ in two different dimensions are discussed: (1) usin...
Ernie Chan, Robert A. van de Geijn, Andrew Chapman
ERSA
2007
86views Hardware» more  ERSA 2007»
14 years 28 days ago
High-Precision BLAS on FPGA-enhanced Computers
The emergence of high-density reconfigurable hardware devices gives scientists and engineers an option to accelerating their numerical computing applications on low-cost but power...
Chuan He, Guan Qin, Richard E. Ewing, Wei Zhao
ARCS
2008
Springer
14 years 1 months ago
An Optimized ZGEMM Implementation for the Cell BE
: The architecture of the IBM Cell BE processor represents a new approach for designing CPUs. The fast execution of legacy software has to stand back in order to achieve very high ...
Timo Schneider, Torsten Hoefler, Simon Wunderlich,...
PARA
1995
Springer
14 years 3 months ago
A Proposal for a Set of Parallel Basic Linear Algebra Subprograms
This paper describes a proposal for a set of Parallel Basic Linear Algebra Subprograms PBLAS. The PBLAS are targeted at distributed vector-vector, matrix-vector and matrixmatrix...
Jaeyoung Choi, Jack Dongarra, Susan Ostrouchov, An...
ISPDC
2008
IEEE
14 years 5 months ago
Heterogeneous PBLAS: Optimization of PBLAS for Heterogeneous Computational Clusters
This paper presents a package, called Heterogeneous PBLAS (HeteroPBLAS), which is built on top of PBLAS and provides optimized parallel basic linear algebra subprograms for hetero...
Ravi Reddy Manumachu, Alexey L. Lastovetsky, Pedro...
IPPS
2009
IEEE
14 years 6 months ago
Generation of Synthetic Floating-Point benchmark circuits
Synthetic Floating-Point (SFP), a synthetic benchmark generator program for floating-point circuits is presented. SFP consists of two independent modules for characterisation and...
T. Chun Pong Chau, S. Man Ho Ho, Philip H. W. Leon...
SC
2009
ACM
14 years 6 months ago
Automating the generation of composed linear algebra kernels
Memory bandwidth limits the performance of important kernels in many scientific applications. Such applications often use sequences of Basic Linear Algebra Subprograms (BLAS), an...
Geoffrey Belter, Elizabeth R. Jessup, Ian Karlin, ...