Sciweavers

INTERSPEECH
2010

Learning from human errors: prediction of phoneme confusions based on modified ASR training

13 years 6 months ago
Learning from human errors: prediction of phoneme confusions based on modified ASR training
In an attempt to improve models of human perception, the recognition of phonemes in nonsense utterances was predicted with automatic speech recognition (ASR) in order to analyze its applicability for modeling human speech recognition (HSR) in noise. In the first experiments, several feature types are used as input for an ASR system; the resulting phoneme scores are compared to listening experiments using the same speech data. With conventional training, the highest correlation between predicted and measured recognition was observed for perceptual linear prediction features (r = 0.84). Secondly, a new training paradigm for ASR is proposed with the aim of improving the prediction of phoneme intelligibility. For this perceptual training, the original utterance labels are modified based on the confusions measured in HSR tests. The modified ASR training improved the overall prediction, with the best models (r = 0.89) exceeding those obtained with conventional training (r = 0.80).
Bernd T. Meyer, Birger Kollmeier
Added 18 May 2011
Updated 18 May 2011
Type Journal
Year 2010
Where INTERSPEECH
Authors Bernd T. Meyer, Birger Kollmeier
Comments (0)