Several XML query languages have been proposed that use XPath expressions to locate data. But XPath expressions might miss some data because of irregularities in the data and schema of an XML data collection. In this paper we propose ApproXPath, which supports approximate path expressions. Approximate path expressions have the same syntax as XPath expressions, but allow content and structural errors. An error is a string or tree edit operation that creates a (virtual) data collection in which the data can be located. ApproXPath extends XPath’s axes, node tests and predicates to utilize the string/tree edit distance. We show that the complexity of ApproXPath is reasonable. For many queries, the inexact matching (with no errors) is as fast as exact matching, and the cost increases linearly with the number of errors allowed.
Lin Xu, Curtis E. Dyreson