A methodology for clustering XML documents by structure

15 years 6 months ago

Download www.ews.uiuc.edu

The processing and management of XML data are popular research issues. However, operations based on the structure of XML data have not received strong attention. These operations involve, among others, the grouping of structurally similar XML documents. Such grouping results from the application of clustering methods with distances that estimate the similarity between tree structures. This paper presents a framework for clustering XML documents by structure. Modeling the XML documents as rooted ordered labeled trees, we study the usage of structural distance metrics in hierarchical clustering algorithms to detect groups of structurally similar XML documents. We suggest the usage of structural summaries for trees to improve the performance of the distance calculation and at the same time to maintain or even improve its quality. Our approach is tested using a prototype testbed. r 2004 Elsevier B.V. All rights reserved.

Theodore Dalamagas, Tao Cheng, Klaas-Jan Winkel, T

Real-time Traffic

Artificial Intelligence | IS 2006 | Similar Xml Documents | XML Data | XML Documents |

claim paper

» Multisets and Clustering XML Documents

» Clustering XML Documents Using Selforganizing Maps for Structures

» Combining Structure and Content Similarities for XML Document Clustering

» A Methodology for Coupling Fragments of XPath with Structural Indexes for XML Documents

» FRACTURE mining Mining frequently and concurrently mutating structures from historical XML...

» XEdge clustering homogeneous and heterogeneous XML documents using edge summaries

» Clustering XML Documents by Structure

» XWarehousing An XMLBased Approach for Warehousing Complex Data

Post Info
More Details (n/a)

Added	13 Dec 2010
Updated	13 Dec 2010
Type	Journal
Year	2006
Where	IS
Authors	Theodore Dalamagas, Tao Cheng, Klaas-Jan Winkel, Timos K. Sellis

Comments (0)

Sciweavers

A methodology for clustering XML documents by structure

Artificial Intelligence | IS 2006 | Similar Xml Documents | XML Data | XML Documents |

Explore & Download

Productivity Tools

Sciweavers