Sciweavers

KDD
2008
ACM

SAIL: summation-based incremental learning for information-theoretic clustering

14 years 12 months ago
SAIL: summation-based incremental learning for information-theoretic clustering
Information-theoretic clustering aims to exploit information theoretic measures as the clustering criteria. A common practice on this topic is so-called INFO-K-means, which performs K-means clustering with the KL-divergence as the proximity function. While expert efforts on INFO-K-means have shown promising results, a remaining challenge is to deal with high-dimensional sparse data. Indeed, it is possible that the centroids contain many zero-value features for high-dimensional sparse data. This leads to infinite KLdivergence values, which create a dilemma in assigning objects to the centroids during the iteration process of Kmeans. To meet this dilemma, in this paper, we propose a Summation-based Incremental Learning (SAIL) method for INFO-K-means clustering. Specifically, by using an equivalent objective function, SAIL replaces the computation of the KL-divergence by the computation of the Shannon entropy. This can avoid the zero-value dilemma caused by the use of the KL-divergence. ...
Junjie Wu, Hui Xiong, Jian Chen
Added 30 Nov 2009
Updated 30 Nov 2009
Type Conference
Year 2008
Where KDD
Authors Junjie Wu, Hui Xiong, Jian Chen
Comments (0)