Search Sciweavers | Sciweavers

183

Voted

WWW
2006
ACM

139views Internet Technology» more WWW 2006»

Do not crawl in the DUST: different URLs with similar text

16 years 1 months ago

Download www2007.org

We consider the problem of dust: Diﬀerent URLs with Similar Text. Such duplicate URLs are prevalent in web sites, as web server software often uses aliases and redirections, and...

Uri Schonfeld, Ziv Bar-Yossef, Idit Keidar

claim paper

Read More »

231

Voted

WWW
2010
ACM

234views Internet Technology» more WWW 2010»

A pattern tree-based approach to learning URL normalization rules

16 years 2 months ago

Download research.microsoft.com

Duplicate URLs have brought serious troubles to the whole pipeline of a search engine, from crawling, indexing, to result serving. URL normalization is to transform duplicate URLs...

Tao Lei, Rui Cai, Jiang-Ming Yang, Yan Ke, Xiaodon...

claim paper

Read More »

193

click to vote

WWW
2008
ACM

152views Internet Technology» more WWW 2008»

Behavioral classification on the click graph

16 years 8 months ago

Download www2008.org

A bipartite query-URL graph, where an edge indicates that a document was clicked for a query, is a useful construct for finding groups of related queries and URLs. Here we use thi...

Martin Szummer, Nick Craswell

claim paper

Read More »

181

click to vote

STACS
2009
Springer

139views Theoretical Computer Science» more STACS 2009»

A Comparison of Techniques for Sampling Web Pages

16 years 2 months ago

Download www.ra.ethz.ch

As the World Wide Web is growing rapidly, it is getting increasingly challenging to gather representative information about it. Instead of crawling the web exhaustively one has to...

Eda Baykan, Monika Rauch Henzinger, Stefan F. Kell...

claim paper

Read More »

218

Voted

WEBDB
2007
Springer

159views Database» more WEBDB 2007»

A clustering-based sampling approach for refreshing search engine's database

16 years 1 months ago

Download leo.saclay.inria.fr

Due to resource constraints, search engines usually have difﬁculties keeping the local database completely synchronized with the Web. To detect as many changes as possible, the ...

Qingzhao Tan, Ziming Zhuang, Prasenjit Mitra, C. L...

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers