Median-shift is a mode seeking algorithm that relies on
computing the median of local neighborhoods, instead of
the mean. We further combine median-shift with Locality
Sensitive Hashing (LSH) and show that the combined algorithm
is suitable for clustering large scale, high dimensional
data sets. In particular, we propose a new mode detection
step that greatly accelerates performance. In the
past, LSH was used in conjunction with mean shift only to
accelerate nearest neighbor queries. Here we show that we
can analyze the density of the LSH bins to quickly detect
potential mode candidates and use only them to initialize
the median-shift procedure. We use the median, instead of
the mean (or its discrete counterpart - the medoid) because
the median is more robust and because the median of a set
is a point in the set. A median is well defined for scalars
but there is no single agreed upon extension of the median
to high dimensional data. We adopt a particular extension,
kno...