Search Sciweavers | Sciweavers

31 search results - page 1 / 7

» Detecting near-duplicates for web crawling

246

click to vote

SIGIR
2008
ACM

176views Information Technology» more SIGIR 2008»

SpotSigs: robust and efficient near duplicate detection in large web collections

15 years 7 months ago

Download ilpubs.stanford.edu

Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching sig...

Martin Theobald, Jonathan Siddharth, Andreas Paepc...

claim paper

Read More »

181

click to vote

LAWEB
2003
IEEE

96views Internet Technology» more LAWEB 2003»

On the Evolution of Clusters of Near-Duplicate Web Pages

16 years 26 days ago

Download research.microsoft.com

This paper expands on a 1997 study of the amount and distribution of near-duplicate pages on the World Wide Web. We downloaded a set of 150 million web pages on a weekly basis ove...

Dennis Fetterly, Mark Manasse, Marc Najork

claim paper

Read More »

256

click to vote

WWW
2008
ACM

189views Internet Technology» more WWW 2008»

Detecting image spam using visual features and near duplicate detection

16 years 8 months ago

Download www2008.org

Email spam is a much studied topic, but even though current email spam detecting software has been gaining a competitive edge against text based email spam, new advances in spam g...

Bhaskar Mehta, Saurabh Nangia, Manish Gupta 0002, ...

claim paper

Read More »

196

click to vote

ICMCS
2007
IEEE

149views Multimedia» more ICMCS 2007»

SICO: A System for Detection of Near-Duplicate Images During Search

16 years 1 months ago

Download goanna.cs.rmit.edu.au

Duplicate and near-duplicate digital image matching is beneﬁcial for image search in terms of collection management, digital content protection, and search efﬁciency. In this ...

Jun Jie Foo, Ranjan Sinha, Justin Zobel

claim paper

Read More »

214

click to vote

WWW
2008
ACM

214views Internet Technology» more WWW 2008»

16 years 8 months ago

Efficient similarity joins for near duplicate detection

Download www2008.org

With the increasing amount of data and the need to integrate data from multiple data sources, a challenging issue is to find near duplicate records efficiently. In this paper, we ...

Chuan Xiao, Wei Wang 0011, Xuemin Lin, Jeffrey Xu ...

claim paper

Read More »

« Prev « First page 1 / 7 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers