Multi-pitch estimation of co-channel speech is especially challenging when the underlying pitch tracks are close in pitch value (e.g., when pitch tracks cross). Building on our previous work in [1], we demonstrate the utility of a two-dimensional (2-D) analysis method of speech for this problem by exploiting its joint representation of pitch and pitch-derivative information from distinct speakers. Specifically, we propose a novel multi-pitch estimation method consisting of 1) a datadriven classifier for pitch candidate selection, 2) local pitch and pitch-derivative estimation by k-means clustering, and 3) a Kalman filtering mechanism for pitch tracking and assignment. We evaluate our method on a database of allvoiced speech mixtures and illustrate its capability to estimate pitch tracks in cases where pitch tracks are separate and when they are close in pitch value (e.g., at crossings).
Tianyu T. Wang, Thomas F. Quatieri