Recent work on incremental crawling has enabled the indexed document collection of a search engine to be more synchronized with the changing World Wide Web. However, this synchron...
Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey...
Due to resource constraints, Web archiving systems and search engines usually have difficulties keeping the entire local repository synchronized with the Web. We advance the state...
Qingzhao Tan, Ziming Zhuang, Prasenjit Mitra, C. L...
In most web sites, web-based applications (such as web portals, emarketplaces, search engines), and in the file systems of personal computers, a wide variety of schemas (such as t...
Paolo Bouquet, Luciano Serafini, Stefano Zanobini,...
In this paper, we continue our investigations of "web spam": the injection of artificially-created pages into the web in order to influence the results from search engin...
Alexandros Ntoulas, Marc Najork, Mark Manasse, Den...
The research in information extraction (IE) regards the generation of wrappers that can extract particular information from semistructured Web documents. Similar to compiler gener...