Hierarchic document clustering has been widely applied to Information Retrieval (IR) on the grounds of its potential improved effectiveness over inverted file search. However, pre...
Anastasios Tombros, Robert Villa, C. J. van Rijsbe...
A major challenge in document clustering is the extremely high dimensionality. For example, the vocabulary for a document set can easily be thousands of words. On the other hand, ...
This paper investigates the applicability of distributed clustering technique, called RACHET [1], to organize large sets of distributed text data. Although the authors of RACHET c...
Large, high dimensional data spaces, are still a challenge for current data clustering methods. Frequent Termset (FTS) clustering is a technique developed to cope with these chall...
This paper proposes a ping-pong document clustering method using NMF and the linkage based refinement alternately, in order to improve the clustering result of NMF. The use of NMF...
Document understanding techniques such as document clustering and multi-document summarization have been receiving much attention in recent years. Current document clustering meth...
Dingding Wang, Shenghuo Zhu, Tao Li, Yun Chi, Yiho...
We present an approach to document clustering based on winnowing fingerprints that achieved good values of effectiveness with considerable save in memory space and computation tim...
In this paper we propose an extension of the PLSA model in which an extra latent variable allows the model to cocluster documents and terms simultaneously. We show on three datase...
Incremental hierarchical text document clustering algorithms are important in organizing documents generated from streaming on-line sources, such as, Newswire and Blogs. However, ...
Recently, much research has been proposed using nature inspired algorithms to perform complex machine learning tasks. Ant Colony Optimization (ACO) is one such algorithm based on s...