We consider the problem of finding and ranking paths in semistructured data without necessarily knowing its full structure. The query language we adopt comprises conjunctions of regular path queries, allowing path variables to appear in the bodies and the heads of rules, so that paths can be returned to the user. We propose an approximate query matching semantics which adapts standard notions of approximation from string matching to graph matching. Query results are returned to the user ranked in order of increasing “distance” to the user’s original query. We show that the top-k approximate answers can be returned in polynomial time in the size of the database graph and the query.
Carlos A. Hurtado, Alexandra Poulovassilis, Peter