Cross-Genre Feature Comparisons for Spoken Sentence Segmentation

14 years 10 months ago

Download www.icsi.berkeley.edu

Automatic sentence segmentation of spoken language is an important precursor to downstream natural language processing. Previous studies combine lexical and prosodic features, but can impose signiﬁcant computational challenges because of the large size of feature sets. Little is understood about which features most beneﬁt performance, particularly for speech data from different speaking styles. We compare sentence segmentation for speech from broadcast news versus natural multi-party meetings, using identical lexical and prosodic feature sets across genres. Results based on boosting and forward selection for this task show that (1) features sets can be reduced with little or no loss in performance, and (2) the contribution of different feature types differs signiﬁcantly by genre. We conclude that more efﬁcient approaches to sentence segmentation and similar tasks can be achieved, especially if genre differences are taken into account.

Sébastien Cuendet, Dilek Z. Hakkani-Tü

Real-time Traffic

Automatic Sentence Segmentation | Feature Sets | Semantic Computing | SEMCO 2007 | Sentence Segmentation |

claim paper

Post Info
More Details (n/a)

Added	04 Jun 2010
Updated	04 Jun 2010
Type	Conference
Year	2007
Where	SEMCO
Authors	Sébastien Cuendet, Dilek Z. Hakkani-Tür, Elizabeth Shriberg, James Fung, Benoît Favre

Comments (0)

Sciweavers

Cross-Genre Feature Comparisons for Spoken Sentence Segmentation

Automatic Sentence Segmentation | Feature Sets | Semantic Computing | SEMCO 2007 | Sentence Segmentation |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers