—Users querying massive social networks or RDF databases are often not 100% certain about what they are looking for due to the complexity of the query or heterogeneity of the data. In this paper, we propose “probabilistic subgraph” (PS) queries over a graph/network database, which afford users great flexibility in specifying “approximately” what they are looking for. We formally define the probability that a substitution satisfies a PS-query with respect to a graph database. We then present the PMATCH algorithm to answer such queries and prove its correctness. Our experimental evaluation demonstrates that PMATCH is efficient and scales to massive social networks with over a billion edges.
Matthias Bröcheler, Andrea Pugliese, V. S. Su