Sciweavers

RIAO
2004

Unsupervised Learning with Term Clustering for Thematic Segmentation of Texts

14 years 2 months ago
Unsupervised Learning with Term Clustering for Thematic Segmentation of Texts
In this paper we introduce a machine learning approach for automatic text segmentation. Our text segmenter clusters text-segments containing similar concepts. It first discovers the different concepts present in a text, each concept being defined as a set of representative terms. After that the text is partitioned into coherent paragraphs using a clustering technique based on the Classification Maximum Likelihood approach. We evaluate the effectiveness of this technique on sets of concatenated paragraphs from two collections, the 7sectors and the 20 Newsgroups corpus, and compare it to a baseline text segmentation technique proposed by Salton et al.
Marc Caillet, Jean-François Pessiot, Massih
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2004
Where RIAO
Authors Marc Caillet, Jean-François Pessiot, Massih-Reza Amini, Patrick Gallinari
Comments (0)