Sciweavers

ICCSA
2005
Springer
14 years 6 months ago
On URL Normalization
Since syntactically different URLs could represent the same resource in WWW, there are on-going efforts to define the URL normalization in the standard communities. This paper cons...
Sang Ho Lee, Sung Jin Kim, Seok-Hoo Hong
HUMAN
2005
Springer
14 years 6 months ago
How to Evaluate the Effectiveness of URL Normalizations
Syntactically different URLs could represent the same web page on the World Wide Web, and duplicate representation for web pages causes web applications to handle a large amount of...
Sang Ho Lee, Sung Jin Kim, Hyo Sook Jeong
WWW
2010
ACM
14 years 7 months ago
A pattern tree-based approach to learning URL normalization rules
Duplicate URLs have brought serious troubles to the whole pipeline of a search engine, from crawling, indexing, to result serving. URL normalization is to transform duplicate URLs...
Tao Lei, Rui Cai, Jiang-Ming Yang, Yan Ke, Xiaodon...