Sciweavers

85 search results - page 6 / 17
» The impact of crawl policy on web search effectiveness
Sort
View
KDD
2008
ACM
183views Data Mining» more  KDD 2008»
14 years 9 months ago
De-duping URLs via rewrite rules
A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
Anirban Dasgupta, Ravi Kumar, Amit Sasturkar
JIS
2008
119views more  JIS 2008»
13 years 8 months ago
A three-year study on the freshness of web search engine databases
This paper deals with one aspect of the index quality of search engines: index freshness. The purpose is to analyse the update strategies of the major Web search engines Google, Y...
Dirk Lewandowski
MM
2006
ACM
167views Multimedia» more  MM 2006»
14 years 2 months ago
Image annotation by large-scale content-based image retrieval
Image annotation has been an active research topic in recent years due to its potentially large impact on both image understanding and Web image search. In this paper, we target a...
Xirong Li, Le Chen, Lei Zhang, Fuzong Lin, Wei-Yin...
WWW
2007
ACM
14 years 9 months ago
Towards Deeper Understanding of the Search Interfaces of the Deep Web
Many databases have become Web-accessible through form-based search interfaces (i.e., HTML forms) that allow users to specify complex and precise queries to access the underlying ...
Hai He, Weiyi Meng, Yiyao Lu, Clement T. Yu, Zongh...
WWW
2005
ACM
14 years 9 months ago
Exploiting the deep web with DynaBot: matching, probing, and ranking
We present the design of Dynabot, a guided Deep Web discovery system. Dynabot's modular architecture supports focused crawling of the Deep Web with an emphasis on matching, p...
Daniel Rocco, James Caverlee, Ling Liu, Terence Cr...