This paper presents a localization and tracking system integrating multiple sensors. Object localization results from local sensor systems are fused using a decentralized Kalman filter. An audiovisual speaker tracking system is evaluated, which is based upon a video based face tracker and a microphone array. A quantitative analysis shows that the presented bimodal tracking system can deliver more robust and reliable results than either of the two single modalities.