This paper presents a Bayesian network based multimodal fusion method for robust and real-time face tracking. The Bayesian network integrates a prior of second order system dynamics, and the likelihood cues from color, edge and face appearance. While different modalities have different confidence scales, we encode the environmental factors related to the confidences of modalities into the Bayesian network, and develop a Fisher discriminant analysis method for learning optimal fusion. The face tracker may track multiple faces under different poses. It is made up of two stages. First hypotheses are efficiently generated using a coarse-tofine strategy; then multiple modalities are integrated in the Bayesian network to evaluate the posterior of each hypothesis. The hypothesis that maximizes a posterior (MAP) is selected as the estimate of the object state. Experimental results demonstrate the robustness and real-time performance of our face tracking approach.
Fang Liu, Xueyin Lin, Stan Z. Li, Yuanchun Shi