Sciweavers

ICONIP
2009

Text Mining with an Augmented Version of the Bisecting K-Means Algorithm

13 years 9 months ago
Text Mining with an Augmented Version of the Bisecting K-Means Algorithm
There is an ever increasing number of electronic documents available today and the task of organizing and categorizing this ever growing corpus of electronic documents has become too large to perform by analog means. In this paper, we have proposed an augmented version of the bisecting k-means clustering algorithm for automated text categorization tasks. In our augmented version, we have added (1) a bootstrap aggregating procedure, (2) a bisecting criteria that relies on dispersions of data within clusters, and (3) a method to automatically terminate the algorithm when an optimal number of clusters have been produced. We have performed text categorization experiments in order to compare our algorithm against the standard bisecting k-means and k-means algorithms. The results showed that our augmented version improved approximately 15% and 20% in classification accuracies compared to the standard bisecting k-means and k-means, respectively.
Yutaro Hatagami, Toshihiko Matsuka
Added 19 Feb 2011
Updated 19 Feb 2011
Type Journal
Year 2009
Where ICONIP
Authors Yutaro Hatagami, Toshihiko Matsuka
Comments (0)