Sciweavers

ICA
2007
Springer

Text Clustering on Latent Thematic Spaces: Variants, Strengths and Weaknesses

14 years 3 months ago
Text Clustering on Latent Thematic Spaces: Variants, Strengths and Weaknesses
Deriving a thematically meaningful partition of an unlabeled document corpus is a challenging task. In this context, the use of document representations based on latent thematic generative models can lead to improved clustering. However, determining a priori the optimal document indexing technique is not straighforward, as it depends on the clustering problem faced and the partitioning strategy adopted. So as to overcome this indeterminacy, we propose deriving a single consensus labeling upon the results of clustering processes executed on several document representations. Experiments conducted on subsets of two standard text corpora evaluate distinct clustering strategies based on latent thematic spaces and highlight the usefulness of consensus clustering to overcome the indeterminacy regarding optimal document indexing.
Xavier Sevillano, Germán Cobo, Francesc Al&
Added 16 Aug 2010
Updated 16 Aug 2010
Type Conference
Year 2007
Where ICA
Authors Xavier Sevillano, Germán Cobo, Francesc Alías, Joan Claudi Socoró
Comments (0)