Graph grammars combine the relational aspect of graphs with the iterative and recursive aspects of string grammars, and thus represent an important next step in our ability to discover knowledge from data. In this paper we describe an approach to learning node replacement graph grammars. This approach is based on previous research in frequent isomorphic subgraphs discovery. We extend the search for frequent subgraphs by checking for overlap among the instances of the subgraphs in the input graph. If subgraphs overlap by one node we propose a node replacement grammar production. We also can infer a hierarchy of productions by compressing portions of a graph described by a production and then infer new productions on the compressed graph. We validate this approach in experiments where we generate graphs from known grammars and measure how well our system infers the original grammar from the generated graph. We also describe results on several real-world tasks from chemical mining to XML...
Jacek P. Kukluk, Lawrence B. Holder, Diane J. Cook