We present an Audio-visual person authentication system which extracts several novel "VisualizedSpeech-Features" (VSF) from the spoken-password and multiple face profiles using a simple user-interface and combine these features to deliver high performance and resilience against imposter attacks. The spoken password is converted to a string of images formed by several visualized speech features. A compressed form of these VSFs preserves speaker identity in a compact manner. Simulation results on an in-house 210-user AV-user-ID database collected with wide variations of users in real-life office environments demonstrate separable distributions of client and imposter scores (0% EER), while offering low storage and computational complexities compared to conventional AV user-recognition methods.
Amitava Das, Ohil K. Manyam, Makarand Tapaswi