Localization and classification of acoustic signals in a complex auditory scene is an every day task of the human auditory system. However, this problem presents a significant challenge for a computational system. In this paper, we propose a framework that allows the detection of different classes of signals (e.g. speech, music) and the localization of the signal sources. As a special application of this framework, we describe a method for localization of human speech in a complex acoustic scene and low SNR, based on vowel detection. In contrast to other works, we detect and extract parts of the signal mixture that correspond to a single class exclusively, using a prior model of the target signals. Herewith, a reliable localization of signal sources is possible.