An HMM trajectory tiling (HTT) approach to high quality TTS

14 years 8 months ago

Download festvox.org

We propose an HMM Trajectory Tiling (HTT) approach to high quality TTS, which is our entry to Blizzard Challenge 2010. In HTT, first refined HMM is trained with the Minimum Generation Error (MGE) criterion; then trajectory generated by the refined HMM is to guide the search for finding the closest waveform segment "tiles" in synthesis. Normalized distances between HMM trajectory and those of the waveform unit candidates are used for selecting final candidates in a unit sausage (lattice). Normalized cross-correlation, a good concatenation measure for its high relevance to spectral similarity, phase continuity and concatenation time instants, is used for finding the best unit sequence in the sausage. The sequence serves as the best segment tiles to closely follow the HMM trajectory guide. Tested in four tasks, {EH1, EH2, MH1 and MH2}, of Blizzard Challenge 2010, the new HTT approach delivers high quality, natural sounding TTS speech without sacrificing high intelligibility. Su...

Yao Qian, Zhi-Jie Yan, Yijian Wu, Frank K. Soong,

Real-time Traffic

Blizzard Challenge | Hmm Trajectory | HMM Trajectory Guide | INTERSPEECH 2010 | Signal Processing |

claim paper

Added	18 May 2011
Updated	18 May 2011
Type	Journal
Year	2010
Where	INTERSPEECH
Authors	Yao Qian, Zhi-Jie Yan, Yijian Wu, Frank K. Soong, Xin Zhuang, Shengyi Kong

Sciweavers

An HMM trajectory tiling (HTT) approach to high quality TTS

Blizzard Challenge | Hmm Trajectory | HMM Trajectory Guide | INTERSPEECH 2010 | Signal Processing |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers