A cluster-based approach to XML similarity joins

16 years 2 months ago

Download wwwlgis.informatik.uni-kl.de

A natural consequence of the widespread adoption of XML as standard for information representation and exchange is the redundant storage of large amounts of persistent XML documents. Compared to relational data tables, data represented in XML format can potentially be even more sensitive to data quality issues because structure, besides textual information, may cause variations in XML documents representing the same information entity. Therefore, correlating XML documents, which are similar in content an structure, is a fundamental operation. In this paper, we present an effective, ﬂexible, and high-performance XML-based similarity join framework. We exploit structural summaries and clustering concepts to produce compact and high-quality XML document representations: our approach outperforms previous work both in terms of performance and accuracy. In this context, we explore diﬀerent ways to weigh and combine evidence from textual and structural XML representations. Furthermore, w...

Leonardo Ribeiro, Theo Härder, Fernanda S. Pi

Real-time Traffic

Database | IDEAS 2009 |

claim paper

Added	24 May 2010
Updated	24 May 2010
Type	Conference
Year	2009
Where	IDEAS
Authors	Leonardo Ribeiro, Theo Härder, Fernanda S. Pimenta

Sciweavers

A cluster-based approach to XML similarity joins

Database | IDEAS 2009 |

Explore & Download

Productivity Tools

Sciweavers