We propose a new framework for speaker recognition, referred as Fishervoice. It includes the design of a feature representation known as the structured score vector (SSV), which relates acoustic structures with “key” frames in an input utterance in capturing relevant speaker characteristics. The framework also applies nonparametric Fisher’s discriminant analysis to map the SSVs into a compressed discriminant subspace, where matching is performed between a test sample and reference speaker samples to achieve speaker recognition. The objective is to reduce intra-speaker variability and emphasize discriminative class boundary information to facilitate speaker recognition. Experiments based on the XM2VTSDB corpus shows that the Fishervoice framework gave superior performance, compared with other commonly used approaches, e.g. GMM-UBM and Eigenvoice.
Zhifeng Li, Weiwu Jiang, Helen M. Meng