Efficient implementation of sorting on multi-core SIMD CPU architecture

15 years 4 months ago

Download www.engr.uconn.edu

Sorting a list of input numbers is one of the most fundamental problems in the field of computer science in general and high-throughput database applications in particular. Although literature abounds with various flavors of sorting algorithms, different architectures call for customized implementations to achieve faster sorting times. This paper presents an efficient implementation and detailed analysis of MergeSort on current CPU architectures. Our SIMD implementation with 128-bit SSE is 3.3X faster than the scalar version. In addition, our algorithm performs an efficient multiway merge, and is not constrained by the memory bandwidth. Our multi-threaded, SIMD implementation sorts 64 million floating point numbers in less than 0.5 seconds on a commodity 4-core Intel processor. This measured performance compares favorably with all previously published results. Additionally, the paper demonstrates performance scalability of the proposed sorting algorithm with respect to certain salient...

Jatin Chhugani, Anthony D. Nguyen, Victor W. Lee,

Real-time Traffic

PVLDB 2008 | SIMD Implementation | SIMD Width | Sorting |

claim paper

» Particlebased volume rendering

» AnySL efficient and portable shading for ray tracing

Post Info
More Details (n/a)

Added	28 Dec 2010
Updated	28 Dec 2010
Type	Journal
Year	2008
Where	PVLDB
Authors	Jatin Chhugani, Anthony D. Nguyen, Victor W. Lee, William Macy, Mostafa Hagog, Yen-Kuang Chen, Akram Baransi, Sanjeev Kumar, Pradeep Dubey

Comments (0)

Sciweavers

Efficient implementation of sorting on multi-core SIMD CPU architecture

PVLDB 2008 | SIMD Implementation | SIMD Width | Sorting |

Explore & Download

Productivity Tools

Sciweavers