We propose a novel and efficient solution to the problem of clustering XML documents based on their structure. We use operations on multisets of paths of document trees to define certain metrics on multisets. These metrics are used for clustering real and synthesized XML documents to produce high-quality clusterings.
Swami Iyer, Dan A. Simovici