Sciweavers

PLDI
2010
ACM

A GPGPU compiler for memory optimization and parallelism management

14 years 6 months ago
A GPGPU compiler for memory optimization and parallelism management
This paper presents a novel optimizing compiler for general purpose computation on graphics processing units (GPGPU). It addresses two major challenges of developing high performance GPGPU programs: effective utilization of GPU memory hierarchy and judicious management of parallelism. The input to our compiler is a naïve GPU kernel function, which is functionally correct but without any consideration for performance optimization. The compiler analyzes the code, identifies its memory access patterns, and generates both the optimized kernel and the kernel invocation parameters. Our optimization process includes vectorization and memory coalescing for memory bandwidth enhancement, tiling and unrolling for data reuse and parallelism management, and thread block remapping or addressoffset insertion for partition-camping elimination. The experiments on a set of scientific and media processing algorithms show that our optimized code achieves very high performance, either superior or very cl...
Yi Yang, Ping Xiang, Jingfei Kong, Huiyang Zhou
Added 10 Jul 2010
Updated 10 Jul 2010
Type Conference
Year 2010
Where PLDI
Authors Yi Yang, Ping Xiang, Jingfei Kong, Huiyang Zhou
Comments (0)