Sciweavers

ADBIS
2015
Springer

Optimizing Sort in Hadoop Using Replacement Selection

8 years 8 months ago
Optimizing Sort in Hadoop Using Replacement Selection
This paper presents and evaluates an alternative sorting component for Hadoop based on the replacement selection algorithm. In comparison with the default quicksort-based implementation, replacement selection generates runs which are in average twice as large. This makes the merge phase more efficient, since the amount of data that can be merged in one pass increases in average by a factor of two. For almost-sorted inputs, replacement selection is often capable of sorting an arbitrarily large file in a single pass, eliminating the need for a merge phase. This paper evaluates an implementation of replacement selection for MapReduce computations in the Hadoop framework. We show that the performance is comparable to quicksort for random inputs, but with substantial gains for inputs which are either almost sorted or require two merge passes in quicksort.
Pedro Martins Dusso, Caetano Sauer, Theo Härd
Added 13 Apr 2016
Updated 13 Apr 2016
Type Journal
Year 2015
Where ADBIS
Authors Pedro Martins Dusso, Caetano Sauer, Theo Härder
Comments (0)