Sciweavers

656 search results - page 62 / 132
» Scalable Parallel Matrix Multiplication on Distributed Memor...
Sort
View
EUROPAR
2009
Springer
14 years 2 months ago
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs
While general-purpose homogeneous multi-core architectures are becoming ubiquitous, there are clear indications that, for a number of important applications, a better performance/p...
Eduard Ayguadé, Rosa M. Badia, Francisco D....
IPPS
1998
IEEE
13 years 12 months ago
High Performance Linear Algebra Package LAPACK90
Abstract. LAPACK90 is a set of LAPACK90 subroutines which interfaces FORTRAN90 with LAPACK. All LAPACK driver subroutines including expert drivers and some LAPACK computationals ha...
Jack Dongarra, Jerzy Wasniewski
IPPS
1996
IEEE
13 years 12 months ago
A Method for Register Allocation to Loops in Multiple Register File Architectures
Multiple instruction issue processors place high demands on register file bandwidth. One solution to reduce this bottleneck is the use of multiple register files. Register allocat...
David J. Kolson, Alexandru Nicolau, Nikil D. Dutt,...
PPOPP
2010
ACM
14 years 5 months ago
Scalable communication protocols for dynamic sparse data exchange
Many large-scale parallel programs follow a bulk synchronous parallel (BSP) structure with distinct computation and communication phases. Although the communication phase in such ...
Torsten Hoefler, Christian Siebert, Andrew Lumsdai...
IPPS
2008
IEEE
14 years 2 months ago
SNAP, Small-world Network Analysis and Partitioning: An open-source parallel graph framework for the exploration of large-scale
We present SNAP (Small-world Network Analysis and Partitioning), an open-source graph framework for exploratory study and partitioning of large-scale networks. To illustrate the c...
David A. Bader, Kamesh Madduri