Sciweavers

656 search results - page 6 / 132
» Scalable Parallel Matrix Multiplication on Distributed Memor...
Sort
View
ICPP
2009
IEEE
14 years 2 months ago
Perfomance Models for Blocked Sparse Matrix-Vector Multiplication Kernels
—Sparse Matrix-Vector multiplication (SpMV) is a very challenging computational kernel, since its performance depends greatly on both the input matrix and the underlying architec...
Vasileios Karakasis, Georgios I. Goumas, Nectarios...
EUROPAR
2010
Springer
13 years 8 months ago
Optimized Dense Matrix Multiplication on a Many-Core Architecture
Abstract. Traditional parallel programming methodologies for improving performance assume cache-based parallel systems. However, new architectures, like the IBM Cyclops-64 (C64), b...
Elkin Garcia, Ioannis E. Venetis, Rishi Khan, Guan...
ICS
1995
Tsinghua U.
13 years 11 months ago
Data Forwarding in Scalable Shared-Memory Multiprocessors
David Koufaty, Xiangfeng Chen, David K. Poulsen, J...
EUROPAR
2006
Springer
13 years 11 months ago
Optimization of Dense Matrix Multiplication on IBM Cyclops-64: Challenges and Experiences
Abstract. This paper presents a study of performance optimization of dense matrix multiplication on IBM Cyclops-64(C64) chip architecture. Although much has been published on how t...
Ziang Hu, Juan del Cuvillo, Weirong Zhu, Guang R. ...