Learning from human errors: prediction of phoneme confusions based on modified ASR training

15 years 1 months ago

Download medi.uni-oldenburg.de

In an attempt to improve models of human perception, the recognition of phonemes in nonsense utterances was predicted with automatic speech recognition (ASR) in order to analyze its applicability for modeling human speech recognition (HSR) in noise. In the first experiments, several feature types are used as input for an ASR system; the resulting phoneme scores are compared to listening experiments using the same speech data. With conventional training, the highest correlation between predicted and measured recognition was observed for perceptual linear prediction features (r = 0.84). Secondly, a new training paradigm for ASR is proposed with the aim of improving the prediction of phoneme intelligibility. For this perceptual training, the original utterance labels are modified based on the confusions measured in HSR tests. The modified ASR training improved the overall prediction, with the best models (r = 0.89) exceeding those obtained with conventional training (r = 0.80).

Bernd T. Meyer, Birger Kollmeier

Real-time Traffic

Automatic Speech Recognition | Conventional Training | INTERSPEECH 2010 | Signal Processing | Training |

claim paper

Post Info
More Details (n/a)

Added	18 May 2011
Updated	18 May 2011
Type	Journal
Year	2010
Where	INTERSPEECH
Authors	Bernd T. Meyer, Birger Kollmeier

Comments (0)

Sciweavers

Learning from human errors: prediction of phoneme confusions based on modified ASR training

Automatic Speech Recognition | Conventional Training | INTERSPEECH 2010 | Signal Processing | Training |

Explore & Download

Productivity Tools

Sciweavers