Simple methods for improving speaker-similarity of HMM-based speech synthesis

15 years 6 months ago

Download www.cstr.ed.ac.uk

In this paper we revisit some basic conﬁguration choices of HMMbased speech synthesis, such as waveform sampling rate, auditory frequency warping scale and the logarithmic scaling of F0, with the aim of improving speaker similarity which is an acknowledged weakness of current HMM-based speech synthesisers. All of the techniques investigated are simple but, as we demonstrate using perceptual tests, can make substantial differences to the quality of the synthetic speech. Contrary to common practice in automatic speech recognition, higher waveform sampling rates can offer enhanced feature extraction and improved speaker similarity for speech synthesis. In addition, a generalized logarithmic transform of F0 results in larger intra-utterance variance of F0 trajectories and hence more dynamic and natural-sounding prosody.

Junichi Yamagishi, Simon King

Real-time Traffic

HMM-based Speech Synthesisers | ICASSP 2010 | Signal Processing | Speaker Similarity | Speech Synthesis |

claim paper

» Utilizing glottal source pulse library for generating improved excitation signal for HMMba...

» Analysis of statistical parametric and unit selection speech synthesis systems applied to ...

» HMMbased speech synthesiser using the LFmodel of the glottal source

» Auditory universal accessibility of data tables using naturally derived prosody specificat...

Post Info
More Details (n/a)

Added	26 Jan 2011
Updated	26 Jan 2011
Type	Journal
Year	2010
Where	ICASSP
Authors	Junichi Yamagishi, Simon King

Comments (0)

Sciweavers

Simple methods for improving speaker-similarity of HMM-based speech synthesis

HMM-based Speech Synthesisers | ICASSP 2010 | Signal Processing | Speaker Similarity | Speech Synthesis |

Explore & Download

Productivity Tools

Sciweavers