Noisy or distorted video/audio training sets represent constant challenges in automated identification and verification tasks. We propose the method of Mutual Interdependence Analysis (MIA) to extract “mutual features” from a high dimensional training set. Mutual features represent a class of objects through a unique direction in the span of the inputs that minimizes the scatter of the projected samples of the class. They capture invariant properties of the object class and can therefore be used for classification. The effectiveness of our approach is tested on real data from face and speaker recognition problems. We show that “mutual faces” extracted from the Yale database are illumination invariant, and obtain identification error rates of 2.2% in leave-one-out tests for differently illuminated images. Also, “mutual speaker signatures” for text independent speaker verification achieve state-of-theart equal error rates of 6.8% on the NTIMIT database.
Heiko Claussen, Justinian Rosca, Robert I. Damper