A two microphone direction of arrival (DOA) estimation technique for multiple speech sources is developed which exploits speech specific properties, namely sparsity in time-frequency (spectrum) domain. For robustness, we exploit the sparsity in the frequency domain by focusing on the spectral content concentrated in sinusoidal tracks obtained through sinusoidal modeling. When multiple speeches are mixed in the two microphone system, the inter-channel phase differences (IPD) between the dual channels on those sinusoidal tracks will be dominated by the spatial information of the most powerful source at that specific time-frequency point because of the spectrum sparsity and masking effects. Thereby, the source localization problem is turned into a clustering problem on the IPD versus frequency plot, and the generalized mixture decomposition algorithm (GMDA) is used to cluster the groups of points corresponding to multiple sources. The DOA of each source is derived from the parameters o...
Wenyi Zhang, Bhaskar D. Rao