We study a generalization of the k-median problem with respect to an arbitrary dissimilarity measure D. Given a finite set P of size n, our goal is to find a set C of size k such that the sum of errors D(P, C) = pP mincC {D(p, c)} is minimized. The main result in this article can be stated as follows: There exists a (1 + )-approximation algorithm for the k-median problem with respect to D, if the 1-median problem can be approximated within a factor of (1 + ) by taking a random sample of constant size and solving the 1-median problem on the sample exactly. This algorithm requires time n2O(mklog(mk/ )) , where m is a constant that depends only on and D. Using this characterization, we obtain the first linear time (1 + )-approximation algorithms for the k-median problem in an arbitrary metric space with bounded doubling dimension, for the Kullback-Leibler divergence (relative entropy), for the Itakura-Saito divergence, for Mahalanobis distances, and for some special cases of Bregman diver...
Marcel R. Ackermann, Johannes Blömer, Christi