Localization of non-linguistic events in spontaneous speech by Non-Negative Matrix Factorization and Long Short-Term Memory

13 years 10 months ago

Download mirlab.org

Features generated by Non-Negative Matrix Factorization (NMF) have successfully been introduced into robust speech processing, including noise-robust speech recognition and detection of nonlinguistic vocalizations. In this study, we introduce a novel tandem approach by integrating likelihood features derived from NMF into Bidirectional Long Short-Term Memory Recurrent Neural Networks (BLSTM-RNNs) in order to dynamically localize non-linguistic events, i. e., laughter, vocal, and non-vocal noise, in highly spontaneous speech. We compare our tandem architecture to a baseline conventional phoneme-HMM-based speech recognizer, and achieve a relative reduction of the frame error rate by 37.5 % in the discrimination of speech and different non-speech segments.

Felix Weninger, Björn Schuller, Martin Wö

Real-time Traffic

ICASSP 2011 | Noise-robust Speech Recognition | Phoneme-HMM-based Speech Recognizer | Robust Speech Processing | Signal Processing |

claim paper

Post Info
More Details (n/a)

Added	21 Aug 2011
Updated	21 Aug 2011
Type	Journal
Year	2011
Where	ICASSP
Authors	Felix Weninger, Björn Schuller, Martin Wöllmer, Gerhard Rigoll

Comments (0)

Sciweavers

Localization of non-linguistic events in spontaneous speech by Non-Negative Matrix Factorization and Long Short-Term Memory

ICASSP 2011 | Noise-robust Speech Recognition | Phoneme-HMM-based Speech Recognizer | Robust Speech Processing | Signal Processing |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers