Given a set of model graphs D and a query graph q, containment search aims to find all model graphs g D such that q contains g (q g). Due to the wide adoption of graph models, fast containment search of graph data finds many applications in various domains. In comparison to traditional graph search that retrieves all the graphs containing q (q g), containment search has its own indexing characteristics that have not yet been examined. In this paper, we perform a systematic study on these characteristics and propose a contrast subgraph-based indexing model, called cIndex. Contrast subgraphs capture the structure differences between model graphs and query graphs, and are thus perfect for indexing due to their high selectivity. Using a redundancy-aware feature selection process, cIndex can sort out a set of significant and distinctive contrast subgraphs and maximize its indexing capability. We show that it is NP-complete to choose the best set of indexing features, and our greedy algo...
Chen Chen, Xifeng Yan, Philip S. Yu, Jiawei Han, D