near-duplicate documents

172

SIGIR
2010
ACM

205views Information Technology» more SIGIR 2010»

Adaptive near-duplicate detection via similarity learning

15 years 10 months ago

In this paper, we present a novel near-duplicate document detection method that can easily be tuned for a particular domain. Our method represents each document as a real-valued s...

Hannaneh Hajishirzi, Wen-tau Yih, Aleksander Kolcz

claim paper

Read More »

162

click to vote

LAWEB
2003
IEEE

96views Internet Technology» more LAWEB 2003»

On the Evolution of Clusters of Near-Duplicate Web Pages

16 years 6 days ago

Download research.microsoft.com

This paper expands on a 1997 study of the amount and distribution of near-duplicate pages on the World Wide Web. We downloaded a set of 150 million web pages on a weekly basis ove...

Dennis Fetterly, Mark Manasse, Marc Najork

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers