Sciweavers

295 search results - page 33 / 59
» Web Crawling
Sort
View
CEAS
2007
Springer
14 years 4 months ago
Characterizing Web Spam Using Content and HTTP Session Analysis
Web spam research has been hampered by a lack of statistically significant collections. In this paper, we perform the first large-scale characterization of web spam using conten...
Steve Webb, James Caverlee, Calton Pu
WEBDB
2005
Springer
102views Database» more  WEBDB 2005»
14 years 3 months ago
Design and Implementation of a Geographic Search Engine
In this paper, we describe the design and initial implementation of a geographic search engine prototype for Germany, based on a large crawl of the de domain. Geographic search en...
Alexander Markowetz, Yen-Yu Chen, Torsten Suel, Xi...
WWW
2009
ACM
14 years 10 months ago
Detecting soft errors by redirection classification
A soft error redirection is a URL redirection to a page that returns the HTTP status code 200 (OK) but has actually no relevant content to the client request. Since such redirecti...
Taehyung Lee, Jinil Kim, Jin Wook Kim, Sung-Ryul K...
WWW
2007
ACM
14 years 10 months ago
Towards Deeper Understanding of the Search Interfaces of the Deep Web
Many databases have become Web-accessible through form-based search interfaces (i.e., HTML forms) that allow users to specify complex and precise queries to access the underlying ...
Hai He, Weiyi Meng, Yiyao Lu, Clement T. Yu, Zongh...
AIRS
2008
Springer
14 years 4 months ago
On the Construction of a Large Scale Chinese Web Test Collection
The lack of a large scale Chinese test collection is an obstacle to the Chinese information retrieval development. In order to address this issue, we built such a collection compos...
Hongfei Yan, Chong Chen, Bo Peng, Xiaoming Li