Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates

13 years 7 months ago

Download nlp.stanford.edu

Despite years of speech recognition research, little is known about which words tend to be misrecognized and why. Previous work has shown that errors increase for infrequent words, short words, and very loud or fast speech, but many other presumed causes of error (e.g., nearby disfluencies, turn-initial words, phonetic neighborhood density) have never been carefully tested. The reasons for the huge differences found in error rates between speakers also remain largely mysterious. Using a mixed-effects regression model, we investigate these and other factors by analyzing the errors of two state-of-the-art recognizers on conversational speech. Words with higher error rates include those with extreme prosodic characteristics, those occurring turninitially or as discourse markers, and doubly confusable pairs: acoustically similar words that also have similar language model probabilities. Words preceding disfluent interruption points (first repetition tokens and words before fragments) also...

Sharon Goldwater, Daniel Jurafsky, Christopher D.

Real-time Traffic

Error Rate | Neighborhood Density | Phonetic Neighborhood | Security Privacy | SPEECH 2010 |

claim paper

Post Info
More Details (n/a)

Added	21 May 2011
Updated	21 May 2011
Type	Journal
Year	2010
Where	SPEECH
Authors	Sharon Goldwater, Daniel Jurafsky, Christopher D. Manning

Comments (0)

Sciweavers

Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates

Error Rate | Neighborhood Density | Phonetic Neighborhood | Security Privacy | SPEECH 2010 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers