Sciweavers

11 search results - page 1 / 3
» On URL Normalization
Sort
View
ICCSA
2005
Springer
14 years 29 days ago
On URL Normalization
Since syntactically different URLs could represent the same resource in WWW, there are on-going efforts to define the URL normalization in the standard communities. This paper cons...
Sang Ho Lee, Sung Jin Kim, Seok-Hoo Hong
HUMAN
2005
Springer
14 years 1 months ago
How to Evaluate the Effectiveness of URL Normalizations
Syntactically different URLs could represent the same web page on the World Wide Web, and duplicate representation for web pages causes web applications to handle a large amount of...
Sang Ho Lee, Sung Jin Kim, Hyo Sook Jeong
WWW
2010
ACM
14 years 2 months ago
A pattern tree-based approach to learning URL normalization rules
Duplicate URLs have brought serious troubles to the whole pipeline of a search engine, from crawling, indexing, to result serving. URL normalization is to transform duplicate URLs...
Tao Lei, Rui Cai, Jiang-Ming Yang, Yan Ke, Xiaodon...
WSDM
2010
ACM
204views Data Mining» more  WSDM 2010»
14 years 2 months ago
Learning URL patterns for webpage de-duplication
Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...
AIRWEB
2006
Springer
13 years 11 months ago
Improving Cloaking Detection using Search Query Popularity and Monetizability
Cloaking is a search engine spamming technique used by some Web sites to deliver one page to a search engine for indexing while serving an entirely different page to users browsin...
Kumar Chellapilla, David Maxwell Chickering