Sciweavers

656 search results - page 14 / 132
» Scalable Parallel Matrix Multiplication on Distributed Memor...
Sort
View
SPAA
1996
ACM
13 years 11 months ago
An Analysis of Dag-Consistent Distributed Shared-Memory Algorithms
In this paper, we analyze the performance of parallel multithreaded algorithms that use dag-consistent distributed shared memory. Specifically, we analyze execution time, page fau...
Robert D. Blumofe, Matteo Frigo, Christopher F. Jo...
ARC
2010
Springer
387views Hardware» more  ARC 2010»
14 years 2 months ago
Optimising Memory Bandwidth Use for Matrix-Vector Multiplication in Iterative Methods
Computing the solution to a system of linear equations is a fundamental problem in scientific computing, and its acceleration has drawn wide interest in the FPGA community [1–3]...
David Boland, George A. Constantinides
CLUSTER
2011
IEEE
12 years 7 months ago
Achieving Scalable Parallelization for the Hessenberg Factorization
—Much of dense linear algebra has been successfully blocked to concentrate the majority of its time in the Level 3 BLAS, which are not only efficient for serial computation, but...
Anthony M. Castaldo, R. Clint Whaley
CF
2008
ACM
13 years 9 months ago
Multi-terabit ip lookup using parallel bidirectional pipelines
To meet growing terabit link rates, highly parallel and scalable architectures are needed for IP lookup engines in next generation routers. This paper proposes an SRAM-based multi...
Weirong Jiang, Viktor K. Prasanna
PDP
2010
IEEE
14 years 24 days ago
A Parallel Preconditioned Conjugate Gradient Solver for the Poisson Problem on a Multi-GPU Platform
- We present a parallel conjugate gradient solver for the Poisson problem optimized for multi-GPU platforms. Our approach includes a novel heuristic Poisson preconditioner well sui...
Marco Ament, Günter Knittel, Daniel Weiskopf,...