Sciweavers

ICASSP
2011
IEEE

Localization of non-linguistic events in spontaneous speech by Non-Negative Matrix Factorization and Long Short-Term Memory

13 years 4 months ago
Localization of non-linguistic events in spontaneous speech by Non-Negative Matrix Factorization and Long Short-Term Memory
Features generated by Non-Negative Matrix Factorization (NMF) have successfully been introduced into robust speech processing, including noise-robust speech recognition and detection of nonlinguistic vocalizations. In this study, we introduce a novel tandem approach by integrating likelihood features derived from NMF into Bidirectional Long Short-Term Memory Recurrent Neural Networks (BLSTM-RNNs) in order to dynamically localize non-linguistic events, i. e., laughter, vocal, and non-vocal noise, in highly spontaneous speech. We compare our tandem architecture to a baseline conventional phoneme-HMM-based speech recognizer, and achieve a relative reduction of the frame error rate by 37.5 % in the discrimination of speech and different non-speech segments.
Felix Weninger, Björn Schuller, Martin Wö
Added 21 Aug 2011
Updated 21 Aug 2011
Type Journal
Year 2011
Where ICASSP
Authors Felix Weninger, Björn Schuller, Martin Wöllmer, Gerhard Rigoll
Comments (0)