Performance portable GPU code generation for matrix multiplication

10 years 2 months ago

Download homepages.inf.ed.ac.uk

Parallel accelerators such as GPUs are notoriously hard to program; exploiting their full performance potential is a job best left for ninja programmers. High-level programming languages coupled with optimizing compilers have been proposed to attempt to address this issue. However, they rely on device-speciﬁc heuristics or hard-coded library implementations to achieve good performance resulting in non-portable solutions that need to be re-optimized for every new device. Achieving performance portability is the holy grail of high-performance computing and has so far remained an open problem even for well studied applications like matrix multiplication. We argue that what is needed is a way to describe applications at a high-level without committing to particular implementations. To this end, we developed in a previous paper a functional data-parallel language which allows applications to be expressed in a device neutral way. We use a set of well-deﬁned rewrite rules to automaticall...

Toomas Remmelg, Thibaut Lutz, Michel Steuwer, Chri

Real-time Traffic

Distributed And Parallel Computing | PPOPP 2016 |

claim paper

» Optimizing Matrix Multiply Using PHiPAC A Portable HighPerformance ANSI C Coding Methodolo...

» Parallel Sparse Matrix Computations Using the PINEAPL Library A Performance Study

» A performance prediction model for the CUDA GPGPU platform

» Generating Empirically Optimized Composed Matrix Kernels from MATLAB Prototypes

» Adaptive inputaware compilation for graphics engines

» Small Discrete Fourier Transforms on GPUs

» PARRAY a unifying array representation for heterogeneous parallelism

» AnySL efficient and portable shading for ray tracing

» A Note on Autotuning GEMM for GPUs

Post Info
More Details (n/a)

Added	09 Apr 2016
Updated	09 Apr 2016
Type	Journal
Year	2016
Where	PPOPP
Authors	Toomas Remmelg, Thibaut Lutz, Michel Steuwer, Christophe Dubach

Comments (0)

Sciweavers

Performance portable GPU code generation for matrix multiplication

Distributed And Parallel Computing | PPOPP 2016 |

Explore & Download

Productivity Tools

Sciweavers