A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
This paper deals with one aspect of the index quality of search engines: index freshness. The purpose is to analyse the update strategies of the major Web search engines Google, Y...
Image annotation has been an active research topic in recent years due to its potentially large impact on both image understanding and Web image search. In this paper, we target a...
Xirong Li, Le Chen, Lei Zhang, Fuzong Lin, Wei-Yin...
Many databases have become Web-accessible through form-based search interfaces (i.e., HTML forms) that allow users to specify complex and precise queries to access the underlying ...
Hai He, Weiyi Meng, Yiyao Lu, Clement T. Yu, Zongh...
We present the design of Dynabot, a guided Deep Web discovery system. Dynabot's modular architecture supports focused crawling of the Deep Web with an emphasis on matching, p...
Daniel Rocco, James Caverlee, Ling Liu, Terence Cr...