Improved pos tagging for text-to-speech synthesis

14 years 10 months ago

Download mirlab.org

One of the fundamental building blocks of text processing for textto-speech (TTS) synthesis is the assignment of a part-of-speech (POS) tag to each input word. POS tags are heavily relied upon for downstream natural language analysis and prosody rendering. Conventional TTS POS tagging tends to resort to detailed handcrafted rules that can accommodate TTS speciﬁcities such as pertinent prosodic features, while mainstream tagging increasingly relies on data-driven statistical models trained on large but fairly generic corpora. This paper proposes a new strategy, hybrid POS tagging, which integrates these two approaches in order to achieve higher tagging accuracy. The resulting framework combines the TTS-speciﬁc advantage of rule-based tagging with the inherent robustness of broadly-trained statistical tagging. Empirical evidence underscores the viability of this framework for improving TTS quality, e.g., in regard to phrase boundary placement and homograph selection.

Ming Sun, Jerome R. Bellegarda

Real-time Traffic

ICASSP 2011 | Input Word | Natural Language Analysis | Prosodic Features | Signal Processing |

claim paper

Post Info
More Details (n/a)

Added	29 Aug 2011
Updated	29 Aug 2011
Type	Journal
Year	2011
Where	ICASSP
Authors	Ming Sun, Jerome R. Bellegarda

Comments (0)

Sciweavers

Improved pos tagging for text-to-speech synthesis

ICASSP 2011 | Input Word | Natural Language Analysis | Prosodic Features | Signal Processing |

Explore & Download

Productivity Tools

Sciweavers