Combining five acoustic level modeling methods for automatic speaker age and gender recognition

13 years 7 months ago

Download www-scf.usc.edu

This paper presents a novel automatic speaker age and gender identification approach which combines five different methods at the acoustic level to improve the baseline performance. The five subsystems are (1) Gaussian mixture model (GMM) system based on mel-frequency cepstral coefficient (MFCC) features, (2) Support vector machine (SVM) based on GMM mean supervectors, (3) SVM based on GMM maximum likelihood linear regression (MLLR) matrix supervectors, (4) SVM based on GMM `Tandem' supervectors, and (5) SVM baseline system based on the 450-dimensional feature vectors including prosodic features at the utterance level provided by the challenge organizing committee. To improve the overall classification performance, fusion of these five subsystems at the score level is performed. The proposed fusion system achieves 52.7% unweighted accuracy for the joint age-gender classification task and outperforms the GMM-MFCC system and SVM baseline, respectively, by 9.6% and 8.2% absolute imp...

Ming Li, Chi-Sang Jung, Kyu Jeong Han

Real-time Traffic

GMM Maximum | INTERSPEECH 2010 | Mel-frequency Cepstral Coefficient | Signal Processing | SVM Baseline |

claim paper

Post Info
More Details (n/a)

Added	18 May 2011
Updated	18 May 2011
Type	Journal
Year	2010
Where	INTERSPEECH
Authors	Ming Li, Chi-Sang Jung, Kyu Jeong Han

Comments (0)

Sciweavers

Combining five acoustic level modeling methods for automatic speaker age and gender recognition

GMM Maximum | INTERSPEECH 2010 | Mel-frequency Cepstral Coefficient | Signal Processing | SVM Baseline |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers