We present how to efficiently mine a set of directed acyclic graphs (DAGs) for unconnected, both multi- or single-rooted, and induced fragments. With a new canonical form that is based on the nodes' topological levels, our miner is faster and uses less storage than general purpose gSpan (Yan, X. and Han, J., 2002). Moreover, it can base support resp. frequency either on the number of embeddings of a fragment in the database or on the number of graphs a fragment appears in. This is crucial for finding frequent fragments in data flow graphs generated from assembly code. Extracting them into new procedures reduces the total code size. The paper shows that our miner outperforms general purpose mining and demonstrates the quantitative effects of DAG mining in program size reduction. KEYWORDS Graph Mining, Compiler Construction
T. Werth, A. Dreweke, Marc Wörlein, Ingrid Fi