In many important problems, one uses the median instead of the mean to estimate a population's center, since the former is more robust. But in general, computing the median is considerably slower than the standard mean calculation, and a fast median algorithm is of interest. The fastest existing algorithm is quickselect. We investigate a novel algorithm binmedian, which has O(n) average complexity. The algorithm uses a recursive binning scheme and relies on the fact that the median and mean are always at most one standard deviation apart. We also propose a related median approximation algorithm binapprox, which has O(n) worst-case complexity. These algorithms are highly competitive with quickselect when computing the median of a single data set, but are significantly faster in updating the median when more data is added.
Ryan J. Tibshirani