Document summarization plays an increasingly important role with the exponential growth of documents on the Web. Many supervised and unsupervised approaches have been proposed to ...
Liangda Li, Ke Zhou, Gui-Rong Xue, Hongyuan Zha, Y...
This paper shares our experience in designing a web crawler that can download billions of pages using a single-server implementation and models its performance. We show that with ...
A major difference between corporate intranets and the Internet is that in intranets the barrier for users to create web pages is much higher. This limits the amount and quality o...
Pavel A. Dmitriev, Nadav Eiron, Marcus Fontoura, E...
When a query is submitted to a search engine, the search engine returns a dynamically generated result page containing the result records, each of which usually consists of a link...
In this paper we propose a hierarchical clustering engine, called SnakeT, that is able to organize on-the-fly the search results drawn from 16 commodity search engines into a hier...