Web count statistics gathered from search engines have been widely used as a resource in a variety of NLP tasks. For some tasks, however, the information they exploit is not fine-...
This paper describes the design of a crawler devised to perform the periodic retrieval of Web documents for a search engine able to accept on-line updates in a concurrent manner. ...
The performance of parallel query processing in a cluster of index servers is crucial for modern web search systems. In such a scenario, the response time basically depends on the...
Claudine Santos Badue, Ricardo A. Baeza-Yates, Ber...
Abstract. The original purpose of Web metadata was to protect endusers from possible harmful content and to simplify search and retrieval. However they can also be exploited in mor...
This paper proposes a strategy to reduce the amount of hardware involved in the solution of search engine queries. It proposes using a secondary compact cache that keeps minimal i...