Abstract. In this paper we consider the problem of web search results clustering in the Polish language, supporting our analysis with results acquired from an experimental system n...
Effective representation of Web search results remains an open problem in the Information Retrieval community. For ambiguous queries, a traditional approach is to organize search ...
We propose a method for discovering the dependency relationships between the topics of documents shared in social networks using the latent social interactions, attempting to answ...
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
Abstract. Newistic is a web mining platform that collects and analyses documents crawled from the Internet. Although it currently processes news articles, it can be easily adapted ...