With the development of voice transformation and speech synthesis technologies, speaker identification systems are likely to face attacks from imposters who use voice transformed or synthesized speech to mimic a particular speaker. Therefore, we investigated in this paper how speaker identification systems perform on voice transformed speech. We conducted experiments with two different approaches, the classical GMM-based speaker identification system and the Phonetic speaker identification system. Our experimental results showed that current standard voice transformation techniques are able to fool the GMM-based system but not the Phonetic speaker identification system. These findings imply that future speaker identification systems should include idiosyncratic knowledge in order to successfully distinguish transformed speech from natural speech and thus be armed against imposter attacks.
Qin Jin, Arthur R. Toth, Alan W. Black, Tanja Schu