The Sort operation is a core part of many critical applications. Despite the large efforts to parallelize it, the fact that it suffers from high data-dependencies vastly limits its performance. Multithreaded architectures are emerging as the most demanding technology in leading-edge processors. These architectures include Simultaneous Multithreading, Chip Multiprocessors and machines combining different multithreading technologies. In this paper, we analyze the memory behavior and improve the performance of the most recent parallel radix and quick integer sort algorithms on modern multithreaded architectures. We achieve speedups up to 4.69x for radix sort and up to 4.17x for quick sort on a machine with 4 multithreaded processors compared to single threaded versions, respectively. We find that since radix sort is CPU-intensive, it exhibits better results on Chip multiprocessors where multiple CPUs are available. While quick sort is accomplishing speedups on all types of multithreading...
Layali K. Rashid, Wessam Hassanein, Moustafa A. Ha