This paper investigates the role of language in accessing information on the Internet. We combined data about website visitors through log-file analysis with data about web-hosts ...
Duplicate URLs have brought serious troubles to the whole pipeline of a search engine, from crawling, indexing, to result serving. URL normalization is to transform duplicate URLs...
Tao Lei, Rui Cai, Jiang-Ming Yang, Yan Ke, Xiaodon...
Social bookmarking is the process through which users share tags for online resources like blogs with others. Such collaborative tags provide valuable metadata for retrieval syste...
Short URLs have become ubiquitous. Especially popular within social networking services, short URLs have seen a significant increase in their usage over the past years, mostly du...
In this paper, we explore a CLIR-based approach to construct large-scale Chinese-English comparable corpora, which is valuable for translation knowledge mining. The initial source...