Sciweavers

ICASSP
2011
IEEE

Improved pos tagging for text-to-speech synthesis

13 years 5 months ago
Improved pos tagging for text-to-speech synthesis
One of the fundamental building blocks of text processing for textto-speech (TTS) synthesis is the assignment of a part-of-speech (POS) tag to each input word. POS tags are heavily relied upon for downstream natural language analysis and prosody rendering. Conventional TTS POS tagging tends to resort to detailed handcrafted rules that can accommodate TTS specificities such as pertinent prosodic features, while mainstream tagging increasingly relies on data-driven statistical models trained on large but fairly generic corpora. This paper proposes a new strategy, hybrid POS tagging, which integrates these two approaches in order to achieve higher tagging accuracy. The resulting framework combines the TTS-specific advantage of rule-based tagging with the inherent robustness of broadly-trained statistical tagging. Empirical evidence underscores the viability of this framework for improving TTS quality, e.g., in regard to phrase boundary placement and homograph selection.
Ming Sun, Jerome R. Bellegarda
Added 29 Aug 2011
Updated 29 Aug 2011
Type Journal
Year 2011
Where ICASSP
Authors Ming Sun, Jerome R. Bellegarda
Comments (0)