Sciweavers

PPOPP
2016
ACM

Performance portable GPU code generation for matrix multiplication

8 years 7 months ago
Performance portable GPU code generation for matrix multiplication
Parallel accelerators such as GPUs are notoriously hard to program; exploiting their full performance potential is a job best left for ninja programmers. High-level programming languages coupled with optimizing compilers have been proposed to attempt to address this issue. However, they rely on device-specific heuristics or hard-coded library implementations to achieve good performance resulting in non-portable solutions that need to be re-optimized for every new device. Achieving performance portability is the holy grail of high-performance computing and has so far remained an open problem even for well studied applications like matrix multiplication. We argue that what is needed is a way to describe applications at a high-level without committing to particular implementations. To this end, we developed in a previous paper a functional data-parallel language which allows applications to be expressed in a device neutral way. We use a set of well-defined rewrite rules to automaticall...
Toomas Remmelg, Thibaut Lutz, Michel Steuwer, Chri
Added 09 Apr 2016
Updated 09 Apr 2016
Type Journal
Year 2016
Where PPOPP
Authors Toomas Remmelg, Thibaut Lutz, Michel Steuwer, Christophe Dubach
Comments (0)