Sciweavers

INTERSPEECH
2010

A classifier-based target cost for unit selection speech synthesis trained on perceptual data

13 years 7 months ago
A classifier-based target cost for unit selection speech synthesis trained on perceptual data
Our goal is to automatically learn a perceptually-optimal target cost function for a unit selection speech synthesiser. The approach we take here is to train a classifier on human perceptual judgements of synthetic speech. The output of the classifier is used to make a simple three-way distinction rather than to estimate a continuously-valued cost. In order to collect the necessary perceptual data, we synthesised 145,137 short sentences with the usual target cost switched off, so that the search was driven by the join cost only. We then selected the 7200 sentences with the best joins and asked 60 listeners to judge them, providing their ratings for each syllable. From this, we derived a rating for each demiphone. Using as input the same context features employed in our conventional target cost function, we trained a classifier on these human perceptual ratings. We synthesised two sets of test sentences with both our standard target cost and the new target cost based on the classifier....
Volker Strom, Simon King
Added 19 May 2011
Updated 19 May 2011
Type Journal
Year 2010
Where INTERSPEECH
Authors Volker Strom, Simon King
Comments (0)