Optimized Dense Matrix Multiplication on a Many-Core Architecture

14 years 2 months ago

Download www.capsl.udel.edu

Abstract. Traditional parallel programming methodologies for improving performance assume cache-based parallel systems. However, new architectures, like the IBM Cyclops-64 (C64), belong to a new set of manycore-on-a-chip systems with a software managed memory hierarchy. New programming and compiling methodologies are required to fully exploit the potential of this new class of architectures. In this paper, we use dense matrix multiplication as a case of study to present a general methodology to map applications to these kinds of architectures. Our methodology exposes the following characteristics: (1) Balanced distribution of work among threads to fully exploit available resources. (2) Optimal register tiling and sequence of traversing tiles, calculated analytically and parametrized according to the register file size of the processor used. This results in minimal memory transfers and optimal register usage. (3) Implementation of architecture specific optimizations to further increase ...

Elkin Garcia, Ioannis E. Venetis, Rishi Khan, Guan

Real-time Traffic

Assume Cache-based Parallel | Distributed And Parallel Computing | EUROPAR 2010 | Optimal Register | Parallel Programming Methodologies |

claim paper

Post Info
More Details (n/a)

Added	09 Nov 2010
Updated	09 Nov 2010
Type	Conference
Year	2010
Where	EUROPAR
Authors	Elkin Garcia, Ioannis E. Venetis, Rishi Khan, Guang R. Gao

Comments (0)

Sciweavers

Optimized Dense Matrix Multiplication on a Many-Core Architecture

Assume Cache-based Parallel | Distributed And Parallel Computing | EUROPAR 2010 | Optimal Register | Parallel Programming Methodologies |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers