Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

219

SIGIR
2006
ACM

209views Information Technology» more SIGIR 2006»

Finding near-duplicate web pages: a large-scale evaluation of algorithms

16 years 1 months ago

Finding near-duplicate web pages: a large-scale evaluation of algorithms

Download ltaa.epfl.ch

Broder et al.’s [3] shingling algorithm and Charikar’s [4] random projection based approach are considered “state-of-theart” algorithms for ﬁnding near-duplicate web pages. Both algorithms were either developed at or used by popular web search engines. We compare the two algorithms on a very

Monika Rauch Henzinger

Real-time Traffic

Algorithms | Broder Et Al | Projection Based Approach | SIGIR 2006 |

claim paper

Related Content

» Efficient similarity joins for near duplicate detection

» Identifying and Filtering NearDuplicate Documents

» PTAG large scale automatic generation of personalized annotation tags for the web

» Finding Related Pages Using the Link Structure of the WWW

» Detection of nearduplicate images for web search

» Utilizing Hyperlink Transitivity to Improve Web Page Clustering

» Addressing peoples information needs directly in a web search result page

» Using Web Graph Structure for Person Name Disambiguation

» Personalizing Web Page Recommendation via Collaborative Filtering and TopicAware Markov Mo...

Post Info
More Details (n/a)

Added	14 Jun 2010
Updated	14 Jun 2010
Type	Conference
Year	2006
Where	SIGIR
Authors	Monika Rauch Henzinger

Comments (0)