Using latent topic features to improve binary classification of spoken documents

13 years 4 months ago

Download mirlab.org

In many topic identiﬁcation applications, supervised training labels are indirectly related to the semantic content of the documents being classiﬁed. For example, many topically distinct emails will all be assigned a single broad category label of “spam” or “not-spam”, and a two-class classiﬁer will lack direct knowledge of the underlying topic structure. This paper examines the degradation of topic identiﬁcation performance on conversational speech when multiple semantic topics are combined into a single broad category. We then develop techniques using document clustering and Latent Dirchlet Allocation (LDA) to exploit the underlying semantic topics which improve performance over classiﬁers trained on the single category label by up to 20%.

Jonathan Wintrode

Real-time Traffic

ICASSP 2011 | Semantic Topics | Signal Processing | Single Broad Category | Topic Identiﬁcation |

claim paper

Post Info
More Details (n/a)

Added	21 Aug 2011
Updated	21 Aug 2011
Type	Journal
Year	2011
Where	ICASSP
Authors	Jonathan Wintrode

Comments (0)

Sciweavers

Using latent topic features to improve binary classification of spoken documents

ICASSP 2011 | Semantic Topics | Signal Processing | Single Broad Category | Topic Identiﬁcation |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers