Search Sciweavers | Sciweavers

367 search results - page 5 / 74

» Duplicate detection in probabilistic data

164

click to vote

WWW
2005
ACM

92views Internet Technology» more WWW 2005»

Duplicate detection in click streams

16 years 7 months ago

Download www2005.org

We consider the problem of finding duplicates in data streams. Duplicate detection in data streams is utilized in various applications including fraud detection. We develop a solu...

Ahmed Metwally, Divyakant Agrawal, Amr El Abbadi

claim paper

Read More »

154

click to vote

PVLDB
2008

99views more PVLDB 2008»

Industry-scale duplicate detection

15 years 6 months ago

Download www.hpi.uni-potsdam.de

Duplicate detection is the process of identifying multiple representations of a same real-world object in a data source. Duplicate detection is a problem of critical importance in...

Melanie Weis, Felix Naumann, Ulrich Jehle, Jens Lu...

claim paper

Read More »

191

click to vote

WWW
2008
ACM

214views Internet Technology» more WWW 2008»

16 years 7 months ago

Efficient similarity joins for near duplicate detection

Download www2008.org

With the increasing amount of data and the need to integrate data from multiple data sources, a challenging issue is to find near duplicate records efficiently. In this paper, we ...

Chuan Xiao, Wei Wang 0011, Xuemin Lin, Jeffrey Xu ...

claim paper

Read More »

207

click to vote

ICDAR
1999
IEEE

118views Document Analysis» more ICDAR 1999»

Models and Algorithms for Duplicate Document Detection

15 years 11 months ago

Download www.cse.lehigh.edu

This paper introduces a framework for clarifying and formalizing the duplicate document detection problem. Four distinct models are presented, each with a corresponding algorithm ...

Daniel P. Lopresti

claim paper

Read More »

280

click to vote

GIS
2010
ACM

312views Automated Reasoning» more GIS 2010»

Detecting nearly duplicated records in location datasets

15 years 5 months ago

Download research.microsoft.com

The quality of a local search engine, such as Google and Bing Maps, heavily relies on its geographic datasets. Typically, these datasets are obtained from multiple sources, e.g., ...

Yu Zheng, Xixuan Fen, Xing Xie, Shuang Peng, James...

claim paper

Read More »

« Prev « First page 5 / 74 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers