Roles of the average voice in speaker-adaptive HMM-based speech synthesis

15 years 2 months ago

Download www.era.lib.ed.ac.uk

In speaker-adaptive HMM-based speech synthesis, there are a few speakers whose synthetic speech sounds worse than that of other speakers, despite having the same amount of adaptation data from within the same corpus. This paper investigates these fluctuations in quality and found that as mel-cepstral distance from the average voice becomes larger, the MOS scores generally become worse. Although the negative correlation obtained is not strong enough, this helps us improve the training and adaptation strategies for average voice models. Furthermore we remark that this correlation is strongly linked to "vocal attractiveness."

Junichi Yamagishi, Oliver Watts, Simon King, Bela

Real-time Traffic

Average Voice | Average Voice Models | INTERSPEECH 2010 | Signal Processing | Speaker-adaptive Hmm-based Speech |

claim paper

Post Info
More Details (n/a)

Added	18 May 2011
Updated	18 May 2011
Type	Journal
Year	2010
Where	INTERSPEECH
Authors	Junichi Yamagishi, Oliver Watts, Simon King, Bela Usabaev

Comments (0)

Sciweavers

Roles of the average voice in speaker-adaptive HMM-based speech synthesis

Average Voice | Average Voice Models | INTERSPEECH 2010 | Signal Processing | Speaker-adaptive Hmm-based Speech |

Explore & Download

Productivity Tools

Sciweavers