Just as a motion field is associated to a moving object, an audio field can be associated to an object that can behave as a sound source. The flow field of such a sound source which moves over time would not only have an optical component, but also an audio component; something we call audio-visual flow. In this paper we present a common structure tensor based variational framework for dense audiovisual flow-field estimation. The proposed scheme improves the rank of the local structure tensor by incorporating an audio information channel which is substantially un-correlated from the complementing visual information channel. The scheme allows ascribing weights to individual sensor modalities based on the confidence in their corresponding measurements. Results are presented to demonstrate how combining multiple modalities in our proposed framework can provide a possible solution to temporary full visual occlusions.
Raffay Hamid, Aaron F. Bobick, Anthony J. Yezzi