The paper presents an approach to combine knowledge from memory and brain sciences with information retrieval research in the design of Web agents. An information retrieval agent f...
The research reported in this paper is the first phase of a larger project on the automatic classification of web pages by their genres, using ngram representations of the web pag...
In this paper we study how we can design an effective parallel crawler. As the size of the Web grows, it becomes imperative to parallelize a crawling process, in order to finish d...
: ? Towards Combining Web Classification and Web Information Extraction: a Case Study Ping Luo, Fen Lin, Yuhong Xiong, Yong Zhao, Zhongzhi Shi HP Laboratories HPL-2009-86 Classific...
We demonstrate the usefulness of the uniform resource locator (URL) alone in performing web page classification. This approach is magnitudes faster than typical web page classific...