In this paper, we propose a set of similarity metrics for manipulating collections of values occuring in XML documents. Following the data model presented in TAX algebra, we treat an XML element as a labeled ordered rooted tree. Consider that XML nodes can be either atomic, i.e, they may contain single values such as short character strings, date, etc, or complex, i.e., nested structures that contain other nodes, we propose two types of similarity metrics: MAVs, for atomic nodes and MCVs, for complex nodes. In the first case, we suggest the use of several application domain dependent metrics. In the second case, we define metrics for complex values that are structure dependent, and can be distinctly applied for tuples and collections of values. We also present experiments showing the effectiveness of our method. Categories and Subject Descriptors H.4 [Information Systems Applications]: Miscellaneous General Terms Experimentation, Measurement Keywords Similarity functions, Vague que...
Carina F. Dorneles, Carlos A. Heuser, Andrei E. N.