Sciweavers

JMLR
2010

Discriminative Topic Segmentation of Text and Speech

13 years 6 months ago
Discriminative Topic Segmentation of Text and Speech
We explore automated discovery of topicallycoherent segments in speech or text sequences. We give two new discriminative topic segmentation algorithms which employ a new measure of text similarity based on word co-occurrence. Both algorithms function by finding extrema in the similarity signal over the text, with the latter algorithm using a compact support-vector based description of a window of text or speech observations in word similarity space to overcome noise introduced by speech recognition errors and off-topic content. In experiments over speech and text news streams, we show that these algorithms outperform previous methods. We observe that topic segmentation of speech recognizer output is a more difficult problem than that of text streams; however, we demonstrate that by using a lattice of competing hypotheses rather than just the one-best hypothesis as input to the segmentation algorithm, the performance of the algorithm can be improved.
Mehryar Mohri, Pedro Moreno, Eugene Weinstein
Added 19 May 2011
Updated 19 May 2011
Type Journal
Year 2010
Where JMLR
Authors Mehryar Mohri, Pedro Moreno, Eugene Weinstein
Comments (0)