We address the problem of similarity metric selection in pairwise affinity clustering. Traditional techniques employ standard algebraic context-independent sample-distance measures, such as the Euclidean distance. More recent context-dependent metric modifications employ the bottleneck principle to develop path-bottleneck or pathaverage distances and define similarities based on geodesics determined according to these metrics. This paper develops a principled context-adaptive similarity metric for pairs of feature vectors utilizing the probability density of all data. Specifically, based on the postulate that Euclidean distance is the canonical metric for data drawn from a unit-hypercube uniform density, a density-geodesic distance measure stemming from Riemannian geometry of curved surfaces is derived. Comparisons with alternative metrics demonstrate the superior properties such as robustness.
Umut Ozertem, Deniz Erdogmus, Miguel Á. Car