Abstract. Due to the dynamic nature of online information, XML documents typically evolve over time. The change of the data values or structures of an XML document may exhibit some particular patterns. In this paper, we focus on the sequence of changes to the structures of an XML document to find out which subtrees in the XML structure frequently change together, which we call Frequently Changing Subtree Patterns (FCSP). In order to keep the discovered patterns more concise, we further define the problem of mining maximal FCSPs. An algorithm derived from the FP-growth is developed to mine the set of maximal FCSPs. Experiment results show that our algorithm is substantially faster than the naive algorithm and it scales well with respect to the size of the XML structure.
Ling Chen 0002, Sourav S. Bhowmick, Liang-Tien Chi