Because of their tremendous computing power and remarkable cost efficiency, GPUs (graphic processing unit) have quickly emerged as an influential computing platform for a broad range of scientific research and engineering practices. However, as GPU is specially designed for massive data-parallel computing, its performance is notoriously subject to the presence of condition statements in a GPU application. On a conditional branch where the threads diverge in which path to take, the threads taking different paths have to run serially. Thread divergence often causes serious performance degradations, impairing the adoption of GPU for a broad class of applications that contain non-trivial branches and certain types of loops. This paper presents a systematic investigation in the employment of runtime thread-data remapping for solving that problem. It introduces a series of findings in both theoretical and empirical aspects for divergence elimination. It describes the complexity and algo...
Eddy Z. Zhang, Yunlian Jiang, Ziyu Guo, Xipeng She