We consider the problem of a user querying semistructured data such as RDF without knowing its structure. In these circumstances, it is helpful if the querying system can perform an approximate matching of the user’s query to the data and can rank the answers in terms of how closely they match the original query. Our approximate matching framework allows us to incorporate standard notions of approximation such as edit distance as well as certain RDFS inference rules, thereby capturing semantic as well as syntactic approximations. The query language we adopt comprises conjunctions of regular path queries, thus including extensions proposed for SPARQL to allow for querying paths using regular expressions. We provide an incremental query evaluation algorithm which runs in polynomial time and returns answers to the user in ranked order.
Carlos A. Hurtado, Alexandra Poulovassilis, Peter