The problem of detecting overlapped speech in stereo recordings using close-talk microphones is important for a variety of applications including the identification of back-channels, interruptions etc. in a dyadic or multi-party interactions. For detecting overlapped speech, we propose a feature derived using the spectral similarity of two channels over a range of acoustic frames. During overlapped speech frames the proposed spectro-temporal similarity-based feature values decrease and during non-overlapped speech frames the feature values increase due to the presence of cross-talk. Thus the proposed feature helps to discriminate the overlapped speech frames from the nonoverlapped ones. Using overlapped speech detection experiments on a dyadic interaction corpus, it is shown that the proposed feature provides a significant improvement, ∼26% absolute, in the accuracy of detecting the overlapped speech frames when used as an additional feature to the baseline feature obtained from t...
Bo Xiao, Prasanta Kumar Ghosh, Panayiotis G. Georg