Non-parallel training for voice conversion based on FT-GMM

13 years 6 months ago

Download mirlab.org

This paper presents a non-parallel training algorithm for voice conversion based on feature transform Gaussian mixture model (FTGMM), which is a mixture model of joint density space of source speaker and target speaker with explicit feature transform modeling. In FT-GMM, the correlations between the distributions of two speakers in each component of the mixture model are not directly modeled, but absorbed into these explicit feature transformations. This makes it possible to extend this model to non-parallel training by simply decomposing it into two sub-models, one for each speaker and optimizing them separatively. A frequency warping process is adopted to compensate performance degradation caused by original spectral distance between source and target speakers. Cross-gender experimental results show that the proposed method achieves comparable performance as parallel training.

Ling-Hui Chen, Zhen-Hua Ling, Li-Rong Dai

Real-time Traffic

Explicit Feature | ICASSP 2011 | Mixture Model | Signal Processing | Target Speakers |

claim paper

Post Info
More Details (n/a)

Added	20 Aug 2011
Updated	20 Aug 2011
Type	Journal
Year	2011
Where	ICASSP
Authors	Ling-Hui Chen, Zhen-Hua Ling, Li-Rong Dai

Comments (0)

Sciweavers

Non-parallel training for voice conversion based on FT-GMM

Explicit Feature | ICASSP 2011 | Mixture Model | Signal Processing | Target Speakers |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers