A classifier-based target cost for unit selection speech synthesis trained on perceptual data

15 years 25 days ago

Download www.cstr.ed.ac.uk

Our goal is to automatically learn a perceptually-optimal target cost function for a unit selection speech synthesiser. The approach we take here is to train a classifier on human perceptual judgements of synthetic speech. The output of the classifier is used to make a simple three-way distinction rather than to estimate a continuously-valued cost. In order to collect the necessary perceptual data, we synthesised 145,137 short sentences with the usual target cost switched off, so that the search was driven by the join cost only. We then selected the 7200 sentences with the best joins and asked 60 listeners to judge them, providing their ratings for each syllable. From this, we derived a rating for each demiphone. Using as input the same context features employed in our conventional target cost function, we trained a classifier on these human perceptual ratings. We synthesised two sets of test sentences with both our standard target cost and the new target cost based on the classifier....

Volker Strom, Simon King

Real-time Traffic

INTERSPEECH 2010 | Signal Processing | Standard Target Cost | Target Cost | Target Cost Function |

claim paper

Added	19 May 2011
Updated	19 May 2011
Type	Journal
Year	2010
Where	INTERSPEECH
Authors	Volker Strom, Simon King

Sciweavers

A classifier-based target cost for unit selection speech synthesis trained on perceptual data

INTERSPEECH 2010 | Signal Processing | Standard Target Cost | Target Cost | Target Cost Function |

Explore & Download

Productivity Tools

Sciweavers