Sciweavers

656 search results - page 45 / 132
» Scalable Parallel Matrix Multiplication on Distributed Memor...
Sort
View
ICPADS
2007
IEEE
14 years 1 months ago
Optimizing Katsevich image reconstruction algorithm on multicore processors
The Katsevich image reconstruction algorithm is the first theoretically exact cone beam image reconstruction algorithm for a helical scanning path in computed tomography (CT). Ho...
Eric Fontaine, Hsien-Hsin S. Lee
EUROPAR
2010
Springer
13 years 8 months ago
Maestro: Data Orchestration and Tuning for OpenCL Devices
Abstract. As heterogeneous computing platforms become more prevalent, the programmer must account for complex memory hierarchies in addition to the difficulties of parallel program...
Kyle Spafford, Jeremy S. Meredith, Jeffrey S. Vett...
JSSPP
2004
Springer
14 years 1 months ago
Multi-toroidal Interconnects: Using Additional Communication Links to Improve Utilization of Parallel Computers
Three-dimensional torus is a common topology of network interconnects of multicomputers due to its simplicity and high scalability. A parallel job submitted to a three-dimensional...
Yariv Aridor, Tamar Domany, Oleg Goldshmidt, Edi S...
IPPS
2002
IEEE
14 years 16 days ago
Variable Partitioning and Scheduling of Multiple Memory Architectures for DSP
Multiple memory module architecture enjoys higher memory access bandwidth and thus higher performance. Two key problems in gaining high performance in this kind of architecture ar...
Qingfeng Zhuge, Bin Xiao, Edwin Hsing-Mean Sha
IPPS
2009
IEEE
14 years 2 months ago
Exploring the effect of block shapes on the performance of sparse kernels
In this paper we explore the impact of the block shape on blocked and vectorized versions of the Sparse Matrix-Vector Multiplication (SpMV) kernel and build upon previous work by ...
Vasileios Karakasis, Georgios I. Goumas, Nectarios...