Sciweavers

DEXAW
2010
IEEE

Identifying Sentence-Level Semantic Content Units with Topic Models

14 years 28 days ago
Identifying Sentence-Level Semantic Content Units with Topic Models
Abstract--Statistical approaches to document content modeling typically focus either on broad topics or on discourselevel subtopics of a text. We present an analysis of the performance of probabilistic topic models on the task of learning sentence-level topics that are similar to facts. The identification of sentential content with the same meaning is an important task in multi-document summarization and the evaluation of multi-document summaries. In our approach, each sentence is represented as a distribution over topics, and each topic is a distribution over words. We compare the topicsentence assignments discovered by a topic model to goldstandard assignments that were manually annotated on a set of closely related pairs of news articles. We observe a clear correspondence between automatically identified and annotated topics. The high accuracy of automatically discovered topicsentence assignments suggests that topic models can be utilized to identify (sub-)sentential semantic conten...
Leonhard Hennig, Thomas Strecker, Sascha Narr, Ern
Added 08 Nov 2010
Updated 08 Nov 2010
Type Conference
Year 2010
Where DEXAW
Authors Leonhard Hennig, Thomas Strecker, Sascha Narr, Ernesto William De Luca, Sahin Albayrak
Comments (0)