Due to their capability for expressing semantics and relationships among data objects, semi-structured documents have become a common way of representing domain knowledge. Comparing structures among semi-structured data objects often reveals useful information and hence tree and graph mining have become useful for applications in areas such as Bioinformatics, Ontology mining, Web mining, XML mining, schema matching etc. The type of sub-structures to be mined differs according to the needs of the applications. An important problem arises in the area of ontology matching, namely that of sub-structure matching as well as concept matching. This sub-structure matching can often help filter out ‘false matches’ in simple concept matching. This problem of sub-structure matching creates the need for distance constrained subtree matching. Our work is focused on the task of mining frequent subtrees from a database of rooted ordered labeled subtrees. Previously we have developed an efficient ...
Henry Tan, Tharam S. Dillon, Fedja Hadzic, Elizabe