Text Clustering on Latent Thematic Spaces: Variants, Strengths and Weaknesses

15 years 10 months ago

Download serpens.salleurl.edu

Deriving a thematically meaningful partition of an unlabeled document corpus is a challenging task. In this context, the use of document representations based on latent thematic generative models can lead to improved clustering. However, determining a priori the optimal document indexing technique is not straighforward, as it depends on the clustering problem faced and the partitioning strategy adopted. So as to overcome this indeterminacy, we propose deriving a single consensus labeling upon the results of clustering processes executed on several document representations. Experiments conducted on subsets of two standard text corpora evaluate distinct clustering strategies based on latent thematic spaces and highlight the usefulness of consensus clustering to overcome the indeterminacy regarding optimal document indexing.

Xavier Sevillano, Germán Cobo, Francesc Al&

Real-time Traffic

Document | Document Representations | ICA 2007 | Optimal Document Indexing | Signal Processing |

claim paper

Post Info
More Details (n/a)

Added	16 Aug 2010
Updated	16 Aug 2010
Type	Conference
Year	2007
Where	ICA
Authors	Xavier Sevillano, Germán Cobo, Francesc Alías, Joan Claudi Socoró

Comments (0)

Sciweavers

Text Clustering on Latent Thematic Spaces: Variants, Strengths and Weaknesses

Document | Document Representations | ICA 2007 | Optimal Document Indexing | Signal Processing |

Explore & Download

Productivity Tools

Sciweavers