Large web search engines have to answer thousands of queries per second with interactive response times. Due to the sizes of the data sets involved, often in the range of multiple...
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
There has been an increased demand for characterizing user access patterns using web mining techniques since the informative knowledge extracted from web server log files can not ...
We introduce the problem of query decomposition, where we are given a query and a document retrieval system, and we want to produce a small set of queries whose union of resulting...
Francesco Bonchi, Carlos Castillo, Debora Donato, ...
In this paper we present a new document representation model based on implicit user feedback obtained from search engine queries. The main objective of this model is to achieve be...