The task of identifying redundant information in documents that are generated from multiple sources provides a significant challenge for summarization and QA systems. Traditional ...
In this paper, we propose a novel document clustering method based on the non-negative factorization of the termdocument matrix of the given document corpus. In the latent semanti...
Abstract. This paper presents a language-independent Multilingual Document Clustering (MDC) approach on comparable corpora. Named entites (NEs) such as persons, locations, organiza...
Clustering hypertext document collection is an important task in Information Retrieval. Most clustering methods are based on document content and do not take into account the hype...
Konstantin Avrachenkov, Vladimir Dobrynin, Danil N...
We propose a hybrid, unsupervised document clustering approach that combines a hierarchical clustering algorithm with Expectation Maximization. We developed several heuristics to ...