Abstract - In this paper, we have developed not only a probabilistic sound localization system including VAD (Voice Activity Detection) component using three microphones but also a face tracking system using a vision camera. Moreover, we have proposed a way to integrate these systems to compensate the errors in the localization of a speaker and to reject unnecessary speech or noise signals entering from the undesired directions effectively. For the purpose of verifying our system's performances, we have installed the proposed audition and vision system to the prototype robot, called IROBAA (Intelligent ROBot for Active Audition), and showed how to integrate an audio-visual system.