

Combining Structure and Content Similarities for XML Document Clustering

14 years 2 months ago
Combining Structure and Content Similarities for XML Document Clustering
This paper proposes a clustering approach that explores both the content and the structure of XML documents for determining similarity among them. Assuming that the content and the structure of XML documents play different roles and importance depending on the use and purpose of a dataset, the content and structure information of the documents are handled using two different similarity measuring methods. The similarity values produced from these two methods are then combined with weightings to measure the overall document similarity. The effect of structure similarity and content similarity on the clustering solution is thoroughly analysed. The experiments prove that clustering of the text-centric XML documents based on the content-only information produces a better solution in a homogeneous environment, documents that derived from one structural definition; however, in a heterogeneous environment, documents that derived from two or more structural definitions, clustering of the text-...
Tien Tran, Richi Nayak, Peter Bruza
Added 12 Oct 2010
Updated 12 Oct 2010
Type Conference
Year 2008
Authors Tien Tran, Richi Nayak, Peter Bruza
Comments (0)