Sciweavers

MLMI
2007
Springer

Binaural Speech Separation Using Recurrent Timing Neural Networks for Joint F0-Localisation Estimation

14 years 5 months ago
Binaural Speech Separation Using Recurrent Timing Neural Networks for Joint F0-Localisation Estimation
A speech separation system is described in which sources are represented in a joint interaural time difference-fundamental frequency (ITD-F0) cue space. Traditionally, recurrent timing neural networks (RTNNs) have been used only to extract periodicity information; in this study, this type of network is extended in two ways. Firstly, a coincidence detector layer is introduced, each node of which is tuned to a particular ITD; secondly, the RTNN is extended to become twodimensional to allow periodicity analysis to be performed at each bestITD. Thus, one axis of the RTNN represents F0 and the other ITD allowing sources to be segregated on the basis of their separation in ITD-F0 space. Source segregation is performed within individual frequency channels without recourse to across-channel estimates of F0 or ITD that are commonly used in auditory scene analysis approaches. The system is evaluated on spatialised speech signals using energy-based metrics and automatic speech recognition.
Stuart N. Wrigley, Guy J. Brown
Added 08 Jun 2010
Updated 08 Jun 2010
Type Conference
Year 2007
Where MLMI
Authors Stuart N. Wrigley, Guy J. Brown
Comments (0)