Sciweavers

367 search results - page 5 / 74
» Duplicate detection in probabilistic data
Sort
View
WWW
2005
ACM
14 years 11 months ago
Duplicate detection in click streams
We consider the problem of finding duplicates in data streams. Duplicate detection in data streams is utilized in various applications including fraud detection. We develop a solu...
Ahmed Metwally, Divyakant Agrawal, Amr El Abbadi
PVLDB
2008
99views more  PVLDB 2008»
13 years 10 months ago
Industry-scale duplicate detection
Duplicate detection is the process of identifying multiple representations of a same real-world object in a data source. Duplicate detection is a problem of critical importance in...
Melanie Weis, Felix Naumann, Ulrich Jehle, Jens Lu...
WWW
2008
ACM
14 years 11 months ago
Efficient similarity joins for near duplicate detection
With the increasing amount of data and the need to integrate data from multiple data sources, a challenging issue is to find near duplicate records efficiently. In this paper, we ...
Chuan Xiao, Wei Wang 0011, Xuemin Lin, Jeffrey Xu ...
ICDAR
1999
IEEE
14 years 3 months ago
Models and Algorithms for Duplicate Document Detection
This paper introduces a framework for clarifying and formalizing the duplicate document detection problem. Four distinct models are presented, each with a corresponding algorithm ...
Daniel P. Lopresti
GIS
2010
ACM
13 years 9 months ago
Detecting nearly duplicated records in location datasets
The quality of a local search engine, such as Google and Bing Maps, heavily relies on its geographic datasets. Typically, these datasets are obtained from multiple sources, e.g., ...
Yu Zheng, Xixuan Fen, Xing Xie, Shuang Peng, James...