Sciweavers

JPDC
2008

Fast parallel GPU-sorting using a hybrid algorithm

13 years 11 months ago
Fast parallel GPU-sorting using a hybrid algorithm
This paper presents an algorithm for fast sorting of large lists using modern GPUs. The method achieves high speed by efficiently utilizing the parallelism of the GPU throughout the whole algorithm. Initially, a parallel bucketsort splits the list into enough sublists then to be sorted in parallel using merge-sort. The parallel bucketsort, implemented in NVIDIA's CUDA, utilizes the synchronization mechanisms, such as atomic increment, that is available on modern GPUs. The mergesort requires scattered writing, which is exposed by CUDA and ATI's Data Parallel Virtual Machine[1]. For lists with more than 512k elements, the algorithm performs better than the bitonic sort algorithms, which have been considered to be the fastest for GPU sorting, and is more than twice as fast for 8M elements. It is 6-14 times faster than single CPU quicksort for 1-8M elements respectively. In addition, the new GPU-algorithm sorts on n log n time as opposed to the standard n(log n)2 for bitonic sort...
Erik Sintorn, Ulf Assarsson
Added 13 Dec 2010
Updated 13 Dec 2010
Type Journal
Year 2008
Where JPDC
Authors Erik Sintorn, Ulf Assarsson
Comments (0)