Rapid speaker adaptation with speaker adaptive training and non-negative matrix factorization

13 years 6 months ago

Download mirlab.org

In this paper, we describe a novel speaker adaptation algorithm based on Gaussian mixture weight adaptation. A small number of latent speaker vectors are estimated with non-negative matrix factorization (NMF). These base vectors encode the correlations between Gaussian activations as learned from the train data. Expressing the speaker dependent Gaussian mixture weights as a linear combination of a small number of base vectors, reduces the number of parameters that must be estimated from the enrollment data. In order to learn meaningful correlations between Gaussian activations from the train data, the NMF-based weight adaptation was combined with vocal tract length normalization (VTLN) and feature-space maximum likelihood linear regression (fMLLR) based speaker adaptive training based. Evaluation on the 5k closed and 20k open vocabulary Wall Street Journal tasks shows a 4% relative word error rate reduction over the speaker independent recognition system which already incorporates VTL...

Xueru Zhang, Kris Demuynck, Hugo Van hamme

Real-time Traffic

Gaussian | Gaussian Activations | ICASSP 2011 | Signal Processing | Weight Adaptation |

claim paper

Post Info
More Details (n/a)

Added	20 Aug 2011
Updated	20 Aug 2011
Type	Journal
Year	2011
Where	ICASSP
Authors	Xueru Zhang, Kris Demuynck, Hugo Van hamme

Comments (0)

Sciweavers

Rapid speaker adaptation with speaker adaptive training and non-negative matrix factorization

Gaussian | Gaussian Activations | ICASSP 2011 | Signal Processing | Weight Adaptation |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers