Sciweavers

38 search results - page 3 / 8
» Parallel Tiled QR Factorization for Multicore Architectures
Sort
View
ASPLOS
2009
ACM
14 years 9 months ago
QR decomposition on GPUs
QR decomposition is a computationally intensive linear algebra operation that factors a matrix A into the product of a unitary matrix Q and upper triangular matrix R. Adaptive sys...
Andrew Kerr, Dan Campbell, Mark Richards
SC
2009
ACM
14 years 3 months ago
Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems
This paper presents a dynamic task scheduling approach to executing dense linear algebra algorithms on multicore systems (either shared-memory or distributed-memory). We use a tas...
Fengguang Song, Asim YarKhan, Jack Dongarra
AAECC
2007
Springer
87views Algorithms» more  AAECC 2007»
13 years 8 months ago
Towards an accurate performance modeling of parallel sparse factorization
We present a simulation-based performance model to analyze a parallel sparse LU factorization algorithm on modern cached-based, high-end parallel architectures. We consider supern...
Laura Grigori, Xiaoye S. Li
JEC
2006
88views more  JEC 2006»
13 years 8 months ago
Synchroscalar: Evaluation of an embedded, multi-core architecture for media applications
We present an overview of the Synchroscalar single-chip, multi-core processor. Through the design of Synchroscalar, we find that high energy efficiency and low complexity can be a...
John Oliver, Ravishankar Rao, Diana Franklin, Fred...
PPOPP
2010
ACM
14 years 5 months ago
Scaling LAPACK panel operations using parallel cache assignment
In LAPACK many matrix operations are cast as block algorithms which iteratively process a panel using an unblocked algorithm and then update a remainder matrix using the high perf...
Anthony M. Castaldo, R. Clint Whaley