Sciweavers

656 search results - page 48 / 132
» Scalable Parallel Matrix Multiplication on Distributed Memor...
Sort
View
IPPS
1998
IEEE
13 years 12 months ago
Compiler-Optimization of Implicit Reductions for Distributed Memory Multiprocessors
This paper presents reduction recognition and parallel code generationstrategies for distributed-memorymultiprocessors. We describe techniques to recognize a broad range of implic...
Bo Lu, John M. Mellor-Crummey
SPAA
1998
ACM
13 years 12 months ago
Elimination Forest Guided 2D Sparse LU Factorization
Sparse LU factorization with partial pivoting is important for many scienti c applications and delivering high performance for this problem is di cult on distributed memory machin...
Kai Shen, Xiangmin Jiao, Tao Yang
IPPS
2009
IEEE
14 years 2 months ago
Work-first and help-first scheduling policies for async-finish task parallelism
Multiple programming models are emerging to address an increased need for dynamic task parallelism in applications for multicore processors and shared-address-space parallel compu...
Yi Guo, Rajkishore Barik, Raghavan Raman, Vivek Sa...
PPOPP
1997
ACM
13 years 11 months ago
LoPC: Modeling Contention in Parallel Algorithms
Parallel algorithm designers need computational models that take first order system costs into account, but are also simple enough to use in practice. This paper introduces the L...
Matthew Frank, Anant Agarwal, Mary K. Vernon
PVM
1997
Springer
13 years 11 months ago
Performance of CAP-Specified Linear Algebra Algorithms
The traditional approach to the parallelization of linear algebra algorithms such as matrix multiplication and LU factorization calls for static allocation of matrix blocks to proc...
Marc Mazzariol, Benoit A. Gennart, Vincent Messerl...