Vocal attractiveness of statistical speech synthesisers

14 years 11 months ago

Download mirlab.org

Our previous analysis of speaker-adaptive HMM-based speech synthesis methods suggested that there are two possible reasons why average voices can obtain higher subjective scores than any individual adapted voice: 1) model adaptation degrades speech quality proportionally to the distance ‘moved’ by the transforms, and 2) psychoacoustic effects relating to the attractiveness of the voice. This paper is a follow-on from that analysis and aims to separate these effects out. Our latest perceptual experiments focus on attractiveness, using average voices and speaker-dependent voices without model transformation, and show that using several speakers to create a voice improves smoothness (measured by Harmonics-to-Noise Ratio), reduces distance from the the average voice in the log F0-F1 space of the ﬁnal voice and hence makes it more attractive at the segmental level. However, this is weakened or overridden at supra-segmental or sentence levels.

Sandra Andraszewicz, Junichi Yamagishi, Simon King

Real-time Traffic

Average Voice | ICASSP 2011 | Individual Adapted Voice | Signal Processing | Speaker-adaptive Hmm-based Speech |

claim paper

Post Info
More Details (n/a)

Added	20 Aug 2011
Updated	20 Aug 2011
Type	Journal
Year	2011
Where	ICASSP
Authors	Sandra Andraszewicz, Junichi Yamagishi, Simon King

Comments (0)

Sciweavers

Vocal attractiveness of statistical speech synthesisers

Average Voice | ICASSP 2011 | Individual Adapted Voice | Signal Processing | Speaker-adaptive Hmm-based Speech |

Explore & Download

Productivity Tools

Sciweavers