Sciweavers

19 search results - page 2 / 4
» Effective web-scale crawling through website analysis
Sort
View
HICSS
2006
IEEE
189views Biometrics» more  HICSS 2006»
14 years 3 months ago
Barriers to Information Access across Languages on the Internet: Network and Language Effects
This paper investigates the role of language in accessing information on the Internet. We combined data about website visitors through log-file analysis with data about web-hosts ...
Anett Kralisch, Thomas Mandl
WWW
2010
ACM
14 years 4 months ago
A pattern tree-based approach to learning URL normalization rules
Duplicate URLs have brought serious troubles to the whole pipeline of a search engine, from crawling, indexing, to result serving. URL normalization is to transform duplicate URLs...
Tao Lei, Rui Cai, Jiang-Ming Yang, Yan Ke, Xiaodon...
CIKM
2008
Springer
13 years 11 months ago
Socialtagger - collaborative tagging for blogs in the long tail
Social bookmarking is the process through which users share tags for online resources like blogs with others. Such collaborative tags provide valuable metadata for retrieval syste...
Shankara B. Subramanya, Huan Liu
WWW
2011
ACM
13 years 4 months ago
we.b: the web of short urls
Short URLs have become ubiquitous. Especially popular within social networking services, short URLs have seen a significant increase in their usage over the past years, mostly du...
Demetres Antoniades, Iasonas Polakis, Georgios Kon...
COLING
2010
13 years 4 months ago
Mining Large-scale Comparable Corpora from Chinese-English News Collections
In this paper, we explore a CLIR-based approach to construct large-scale Chinese-English comparable corpora, which is valuable for translation knowledge mining. The initial source...
Degen Huang, Lian Zhao, Lishuang Li, Haitao Yu