Efficient Semantic-Aware Detection of Near Duplicate Resources

15 years 10 months ago

Download www.l3s.de

Abstract. Efficiently detecting near duplicate resources is an important task when integrating information from various sources and applications. Once detected, near duplicate resources can be grouped together, merged, or removed, in order to avoid repetition and redundancy, and to increase the diversity in the information provided to the user. In this paper, we introduce an approach for efficient semantic-aware near duplicate detection, by combining an indexing scheme for similarity search with the RDF representations of the resources. We provide a probabilistic analysis for the correctness of the suggested approach, which allows applications to configure it for satisfying their specific quality requirements. Our experimental evaluation on the RDF descriptions of real-world news articles from various news agencies demonstrates the efficiency and effectiveness of our approach. Key words: near duplicate detection, data integration

Ekaterini Ioannou, Odysseas Papapetrou, Dimitrios

Real-time Traffic

Duplicate Detection | Duplicate Resources | ESWS 2010 | Important Task | Internet Technology |

claim paper

» SpotSigs robust and efficient near duplicate detection in large web collections

» Optimizing Near Duplicate Detection for P2P Networks

» MyFinder nearduplicate detection for large image collections

» Statistical inference of chromosomal homology based on gene colinearity and applications t...

» Topk Set Similarity Joins

» Nearduplicate keyframe retrieval with visual keywords and semantic context

» Cleaning Web Pages for Effective Web Content Mining

» Parallel Processing of HighDimensional Remote Sensing Images Using Cluster Computer Archit...

Post Info
More Details (n/a)

Added	02 Sep 2010
Updated	02 Sep 2010
Type	Conference
Year	2010
Where	ESWS
Authors	Ekaterini Ioannou, Odysseas Papapetrou, Dimitrios Skoutas, Wolfgang Nejdl

Comments (0)

Sciweavers

Efficient Semantic-Aware Detection of Near Duplicate Resources

Duplicate Detection | Duplicate Resources | ESWS 2010 | Important Task | Internet Technology |

Explore & Download

Productivity Tools

Sciweavers