Fast GPGPU Data Rearrangement Kernels using CUDA

15 years 6 months ago

Download www.hipc.org

: Many high performance computing algorithms are bandwidth limited, hence the need for optimal data rearrangement kernels as well as their easy integration into the rest of the application. In this work, we have built a CUDA library of fast kernels for a set of data rearrangement operations. In particular, we have built generic kernels for rearranging m dimensional data into n dimensions, including Permute, Reorder, Interlace/Deinterlace, etc. We have also built kernels for generic Stencil computations on a two-dimensional data using templates and functors that allow application developers to rapidly build customized high performance kernels. All the kernels built achieve or surpass best-known performance in terms of bandwidth utilization.

Michael Bader, Hans-Joachim Bungartz, Dheevatsa Mu

Real-time Traffic

CORR 2010 | Education | Generic Kernels | Generic Stencil Computations | Optimal Data Rearrangement |

claim paper

» Evolving a CUDA kernel from an nVidia template

» Accelerating Parameter Sweep Applications Using CUDA

» Interblock GPU communication via fast barrier synchronization

» Fast Ray Sorting and BreadthFirst Packet Traversal for GPU Ray Tracing

Post Info
More Details (n/a)

Added	09 Dec 2010
Updated	09 Dec 2010
Type	Journal
Year	2010
Where	CORR
Authors	Michael Bader, Hans-Joachim Bungartz, Dheevatsa Mudigere, Srihari Narasimhan, Babu Narayanan

Comments (0)

Sciweavers

Fast GPGPU Data Rearrangement Kernels using CUDA

CORR 2010 | Education | Generic Kernels | Generic Stencil Computations | Optimal Data Rearrangement |

Explore & Download

Productivity Tools

Sciweavers