Toward Scalable Matrix Multiply on Multithreaded Architectures

14 years 9 months ago

Download userweb.cs.utexas.edu

We show empirically that some of the issues that aﬀected the design of linear algebra libraries for distributed memory architectures will also likely aﬀect such libraries for shared memory architectures with many simultaneous threads of execution, including SMP architectures and future multicore processors. The always-important matrix-matrix multiplication is used to demonstrate that a simple one-dimensional data partitioning is suboptimal in the context of dense linear algebra operations and hinders scalability. In addition we advocate the publishing of low-level interfaces to supporting operations, such as the copying of data to contiguous memory, so that library developers may further optimize parallel linear algebra implementations. Data collected on a 16 CPU Itanium2 server supports these observations.

Bryan Marker, Field G. Van Zee, Kazushige Goto, Gr

Real-time Traffic

EUROPAR 2007 | Linear Algebra | Linear Algebra Libraries | Memory Architectures |

claim paper

Post Info
More Details (n/a)

Added	07 Jun 2010
Updated	07 Jun 2010
Type	Conference
Year	2007
Where	EUROPAR
Authors	Bryan Marker, Field G. Van Zee, Kazushige Goto, Gregorio Quintana-Ortí, Robert A. van de Geijn

Comments (0)

Sciweavers

Toward Scalable Matrix Multiply on Multithreaded Architectures

EUROPAR 2007 | Linear Algebra | Linear Algebra Libraries | Memory Architectures |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers