Multi-modal person authentication systems can achieve higher performance and robustness by combining different modalities. The current fusion strategies of different modalities are mainly based on the output of individual modalities. However, there are detail structures between facial movement and speech signal. In this paper, Audio/Visual association, a lower level fusion, is proposed to fuse the information between lip movement and speech signal. The experimental results indicate that this type of fusion strategy improve the performance of multi-modal person authentication system.
Ming S. Liu, Thomas S. Huang