We investigate several feature normalization and scaling approaches for use in speaker verification based on support vector machines. We are particularly interested in methods that are “knowledge-free” and work for a variety of features, leading us to investigate MLLR transforms, phone N-grams, prosodic sequences, and word N-gram features. Normalization methods studied include mean/variance normalization, TFLLR and TFLOG scaling, and a simple nonparametric approach: rank-normalization. We find that rank-normalization is uniformly competitive with other methods, and improves upon them in many cases.
Andreas Stolcke, Sachin S. Kajarekar, Luciana Ferr