Improved F0 modeling and generation in voice conversion

13 years 6 months ago

Download mirlab.org

F0 is an acoustic feature that varies largely from one speaker to another. F0 is characterized by a discontinuity in the transition between voiced and unvoiced sounds that presents an obstacle to GMM modeling for use in voice conversion. A Multi-Space Distribution (MSD) [5] can be used to model unvoiced and voiced F0 regions in a linearly weighted mixture. However, the use of two incompatible probabilistic spaces, for example a continuous probability density for voiced observations, and a discrete probability for unvoiced observations, may result in an imprecise voiced/unvoiced (v/u) conversion in a maximum likelihood (ML) sense. In this paper we propose to use voicing strength, characterized by the normalized correlation coefﬁcient magnitude, as calculated from F0 feature extraction, as an additional feature for improving F0 modeling and the v/u decision in the context of voice conversion. The proposed method was evaluated on male-to-female voice conversion tasks in both Mandarin a...

Aki Kunikoshi, Yao Qian, Frank K. Soong, Nobuaki M

Real-time Traffic

ICASSP 2011 | Incompatible Probabilistic Spaces | Male-to-female Voice Conversion | Signal Processing | Voice Conversion |

claim paper

Post Info
More Details (n/a)

Added	21 Aug 2011
Updated	21 Aug 2011
Type	Journal
Year	2011
Where	ICASSP
Authors	Aki Kunikoshi, Yao Qian, Frank K. Soong, Nobuaki Minematsu

Comments (0)

Sciweavers

Improved F0 modeling and generation in voice conversion

ICASSP 2011 | Incompatible Probabilistic Spaces | Male-to-female Voice Conversion | Signal Processing | Voice Conversion |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers