Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

228

Voted

PPOPP
2009
ACM

358views Distributed and Parallel Com...» more PPOPP 2009»

OpenMP to GPGPU: a compiler framework for automatic translation and optimization

16 years 7 months ago

OpenMP to GPGPU: a compiler framework for automatic translation and optimization

Download www.multicoreinfo.com

GPGPUs have recently emerged as powerful vehicles for generalpurpose high-performance computing. Although a new Compute Unified Device Architecture (CUDA) programming model from NVIDIA offers improved programmability for general computing, programming GPGPUs is still complex and error-prone. This paper presents a compiler framework for automatic source-to-source translation of standard OpenMP applications into CUDA-based GPGPU applications. The goal of this translation is to further improve programmability and make existing OpenMP applications amenable to execution on GPGPUs. In this paper, we have identified several key transformation techniques, which enable efficient GPU global memory access, to achieve high performance. Experimental results from two important kernels (JACOBI and SPMUL) and two NAS OpenMP Parallel Benchmarks (EP and CG) show that the described translator and compile-time optimizations work well on both regular and irregular applications, leading to performance impr...

Seyong Lee, Seung-Jai Min, Rudolf Eigenmann

Real-time Traffic

Automatic Source-to-source Translation | Enable Efficient Gpu | Parallel Computing | PPOPP 2009 | Standard Openmp Applications |

claim paper

Related Content

» OpenUH an optimizing portable OpenMP compiler

» A ROSEBased OpenMP 30 Research Compiler Supporting Multiple Runtime Libraries

» Optimizing irregular sharedmemory applications for distributedmemory systems

» Effective CrossPlatform Multilevel Parallelism via Dynamic Adaptive Execution

» Scheduling FFT computation on SMP and multicore systems

» Efficiently Building the Gated Single Assignment Form in Codes with Pointers in Modern Opt...

» Design of a WCETAware C Compiler

» Communication optimizations for global multithreaded instruction scheduling

» Parallelizing sequential applications on commodity hardware using a lowcost software trans...

Post Info
More Details (n/a)

Added	25 Nov 2009
Updated	25 Nov 2009
Type	Conference
Year	2009
Where	PPOPP
Authors	Seyong Lee, Seung-Jai Min, Rudolf Eigenmann

Comments (0)