This paper presents a novel approach for multimodal information fusion. The proposed method is based on kernel cross-modal factor analysis (KCFA), in which the optimal transformations that represent the coupled patterns between two different subsets of features are identified by minimizing the Frobenius norm in the transformed domain. It generalizes the linear cross-modal factor analysis (CFA) method via the kernel trick to model the nonlinear relationship between two multidimensional variables. The effectiveness of the introduced solution is demonstrated through experimentation on an audiovisual based emotion recognition problem. Experimental results show that the proposed approach outperforms the concatenation based feature level fusion, the linear CFA, as well as the canonical correlation analysis (CCA) and kernel CCA methods.
Yongjin Wang, Ling Guan, Anastasios N. Venetsanopo