Sciweavers

ICASSP
2011
IEEE

Vocal attractiveness of statistical speech synthesisers

13 years 4 months ago
Vocal attractiveness of statistical speech synthesisers
Our previous analysis of speaker-adaptive HMM-based speech synthesis methods suggested that there are two possible reasons why average voices can obtain higher subjective scores than any individual adapted voice: 1) model adaptation degrades speech quality proportionally to the distance ‘moved’ by the transforms, and 2) psychoacoustic effects relating to the attractiveness of the voice. This paper is a follow-on from that analysis and aims to separate these effects out. Our latest perceptual experiments focus on attractiveness, using average voices and speaker-dependent voices without model transformation, and show that using several speakers to create a voice improves smoothness (measured by Harmonics-to-Noise Ratio), reduces distance from the the average voice in the log F0-F1 space of the final voice and hence makes it more attractive at the segmental level. However, this is weakened or overridden at supra-segmental or sentence levels.
Sandra Andraszewicz, Junichi Yamagishi, Simon King
Added 20 Aug 2011
Updated 20 Aug 2011
Type Journal
Year 2011
Where ICASSP
Authors Sandra Andraszewicz, Junichi Yamagishi, Simon King
Comments (0)