Learning Sub-Word Units for Open Vocabulary Speech Recognition

14 years 11 months ago

Download www.cs.jhu.edu

Large vocabulary speech recognition systems fail to recognize words beyond their vocabulary, many of which are information rich terms, like named entities or foreign words. Hybrid word/sub-word systems solve this problem by adding sub-word units to large vocabulary word based systems; new words can then be represented by combinations of subword units. Previous work heuristically created the sub-word lexicon from phonetic representations of text using simple statistics to select common phone sequences. We propose a probabilistic model to learn the subword lexicon optimized for a given task. We consider the task of out of vocabulary (OOV) word detection, which relies on output from a hybrid model. A hybrid model with our learned sub-word lexicon reduces error by 6.3% and 7.6% (absolute) at a 5% false alarm rate on an English Broadcast News and MIT Lectures task respectively.

Carolina Parada, Mark Dredze, Abhinav Sethy, Ariya

Real-time Traffic

ACL 2011 | Computational Linguistics | Phonetic Representations | Speech Recognition Systems | Word Lexicon |

claim paper

Post Info
More Details (n/a)

Added	23 Aug 2011
Updated	23 Aug 2011
Type	Journal
Year	2011
Where	ACL
Authors	Carolina Parada, Mark Dredze, Abhinav Sethy, Ariya Rastrow

Comments (0)

Sciweavers

Learning Sub-Word Units for Open Vocabulary Speech Recognition

ACL 2011 | Computational Linguistics | Phonetic Representations | Speech Recognition Systems | Word Lexicon |

Explore & Download

Productivity Tools

Sciweavers