This paper presents the Athens Information Technology system for 3D person tracking and the obtained results in the CLEAR 2007 evaluations. The system utilizes audiovisual information from multiple acoustic and video sensors. The proposed system comprises a video and an audio subsystem whose results are suitably combined to track the last active speaker. The video subsystem combines in 3D a number of 2D face localization systems, aiming at tracking all people present in a room. The audio subsystem uses an information theoretic metric upon an ensemble of microphones to estimate the active speaker.