Various problems in machine learning, databases, and statistics involve pairwise distances among a set of objects. It is often desirable for these distances to satisfy the properties of a metric, especially the triangle inequality. Applications where metric data is useful include clustering, classification, metric-based indexing, and approximation algorithms for various graph problems. This paper presents the Metric Nearness Problem: Given a dissimilarity matrix, find the "nearest" matrix of distances that satisfy the triangle inequalities. For p nearness measures, this paper develops efficient triangle fixing algorithms that compute globally optimal solutions by exploiting the inherent structure of the problem. Empirically, the algorithms have time and storage costs that are linear in the number of triangle constraints. The methods can also be easily parallelized for additional speed.
Inderjit S. Dhillon, Suvrit Sra, Joel A. Tropp