Reverberant environments pose a challenge to speech acquisition from distant microphones. Approaches using microphone arrays have met with limited success. Recent research using audio-visual sensors for tasks such as speaker localization has shown improvement over traditional audio-only approaches. Using computer vision techniques we can estimate theorientationofthespeaker'sheadinadditiontothelocation of the speaker. In this paper we study the utility of using the head pose information for effective beamforming and clean speech acquisition from distant microphones. The improvementsinspeechrecognitionaccuracyrelativetothatofaclose talking microphone are presented and the results provide sufcient motivation for incorporating head pose information in beamforming techniques.
Shankar T. Shivappa, Bhaskar D. Rao, Mohan M. Triv