Automatically finding semantically consistent n-grams to add new words in LVCSR systems

13 years 11 months ago

Download mirlab.org

This paper presents a new method to automatically add n-grams containing out-of-vocabulary (OOV) words to a baseline language model (LM), where these n-grams are sought to be grammatically correct and to make sense according to the meaning of OOV words. First, this method consists in determining the word sequences, i.e., n-grams, in which the usage of a given OOV word is the most semantically consistent. Then, conditional probabilities of these n-grams have to be computed. To do this, semantic relations between words are used to assimilate each OOV word to several equivalent invocabulary words. Based on these last words, n-grams from the baseline LM are re-used to ﬁnd the word sequences to be added and to compute their probabilities. After augmenting the vocabulary and launching a recognition process, experiments show that our method results in WER improvements which are comparable to those obtained using a state-of-the-art open vocabulary LM.

Gwénolé Lecorvé, Guillaume Gr

Real-time Traffic

Baseline Language Model | ICASSP 2011 | Oov Word | Signal Processing | Word Sequences |

claim paper

Post Info
More Details (n/a)

Added	20 Aug 2011
Updated	20 Aug 2011
Type	Journal
Year	2011
Where	ICASSP
Authors	Gwénolé Lecorvé, Guillaume Gravier, Pascale Sébillot

Comments (0)

Sciweavers

Automatically finding semantically consistent n-grams to add new words in LVCSR systems

Baseline Language Model | ICASSP 2011 | Oov Word | Signal Processing | Word Sequences |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers