The relative difference between two data values is of interest in a number of application domains including temporal and spatial applications, schema versioning, data warehousing (particularly data preparation), internet searching, validation and error correction, and data mining. Moreover, consistency across systems in determining such distances and the robustness of such calculations is essential in some domains and useful in many. Despite this, there is no generally adopted approach to determining such distances and no accommodation of distance within SQL or any commercially available DBMS. For non-numeric data values calculating the difference between values often requires applicationspecific support but even for numeric values the practical distance between two values may not simply be their numeric difference or Euclidean distance. In this paper, a model of semantic distance is developed in which a graph-based approach is used to quantify the distance between two data values....
John F. Roddick, Kathleen Hornsby, Denise de Vries