Search Sciweavers | Sciweavers

52 search results - page 1 / 11

» Finding near-duplicate web pages: a large-scale evaluation o...

212

Voted

WWW
2008
ACM

214views Internet Technology» more WWW 2008»

16 years 8 months ago

Efficient similarity joins for near duplicate detection

Download www2008.org

With the increasing amount of data and the need to integrate data from multiple data sources, a challenging issue is to find near duplicate records efficiently. In this paper, we ...

Chuan Xiao, Wei Wang 0011, Xuemin Lin, Jeffrey Xu ...

claim paper

Read More »

220

Voted

CPM
2000
Springer

177views Combinatorics» more CPM 2000»

Identifying and Filtering Near-Duplicate Documents

15 years 12 months ago

Download www.cs.princeton.edu

Abstract. The mathematical concept of document resemblance captures well the informal notion of syntactic similarity. The resemblance can be estimated using a ﬁxed size “sketch...

Andrei Z. Broder

claim paper

Read More »

211

click to vote

WWW
2007
ACM

168views Internet Technology» more WWW 2007»

P-TAG: large scale automatic generation of personalized annotation tags for the web

16 years 8 months ago

Download www2007.org

The success of the Semantic Web depends on the availability of Web pages annotated with metadata. Free form metadata or tags, as used in social bookmarking and folksonomies, have ...

Paul-Alexandru Chirita, Stefania Costache, Wolfgan...

claim paper

Read More »

214

click to vote

SIGIR
2006
ACM

209views Information Technology» more SIGIR 2006»

Finding near-duplicate web pages: a large-scale evaluation of algorithms

16 years 1 months ago

Download ltaa.epfl.ch

Broder et al.’s [3] shingling algorithm and Charikar’s [4] random projection based approach are considered “state-of-theart” algorithms for ﬁnding near-duplicate web pag...

Monika Rauch Henzinger

claim paper

Read More »

177

Voted

WEBI
2004
Springer

129views Internet Technology» more WEBI 2004»

16 years 25 days ago

Finding Related Pages Using the Link Structure of the WWW

Download www.kbs.uni-hannover.de

Most of the current algorithms for ﬁnding related pages are exclusively based on text corpora of the WWW or incorporate only authority or hub values of pages. In this paper, we ...

Paul-Alexandru Chirita, Daniel Olmedilla, Wolfgang...

claim paper

Read More »

« Prev « First page 1 / 11 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers