Bayesian Unsupervised Topic Segmentation

14 years 1 months ago

Download people.csail.mit.edu

This paper describes a novel Bayesian approach to unsupervised topic segmentation. Unsupervised systems for this task are driven by lexical cohesion: the tendency of wellformed segments to induce a compact and consistent lexical distribution. We show that lexical cohesion can be placed in a Bayesian context by modeling the words in each topic segment as draws from a multinomial language model associated with the segment; maximizing the observation likelihood in such a model yields a lexically-cohesive segmentation. This contrasts with previous approaches, which relied on hand-crafted cohesion metrics. The Bayesian framework provides a principled way to incorporate additional features such as cue phrases, a powerful indicator of discourse structure that has not been previously used in unsupervised segmentation systems. Our model yields consistent improvements over an array of state-of-the-art systems on both text and speech datasets. We also show that both an entropy-based analysis and...

Jacob Eisenstein, Regina Barzilay

Real-time Traffic

EMNLP 2008 | Lexical Cohesion | Model Yields | Natural Language Processing | Unsupervised Topic Segmentation |

claim paper

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	EMNLP
Authors	Jacob Eisenstein, Regina Barzilay

Comments (0)

Sciweavers

Bayesian Unsupervised Topic Segmentation

EMNLP 2008 | Lexical Cohesion | Model Yields | Natural Language Processing | Unsupervised Topic Segmentation |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers