Recovery of Rare Words in Lecture Speech

15 years 5 months ago

Download www.fit.vutbr.cz

The vocabulary used in speech usually consists of two types of words: a limited set of common words, shared across multiple documents, and a virtually unlimited set of rare words, each of which might appear a few times only in particular documents. In most documents, however, these rare words are not seen at all. The ﬁrst type of words is typically included in the language model of an automatic speech recognizer (ASR) and is thus widely referred to as invocabulary (IV). Words of the second type are missing in the language model and thus are called out-of-vocabulary (OOV). However, these words usually carry important information. We use a hybrid word/sub-word recognizer to detect OOV words occurring in English talks and describe them as sequences of sub-words. We detected about one third of all OOV words, and were able to recover the correct spelling for 26.2% of all detections by using a phoneme-to-grapheme (P2G) conversion trained on the recognition dictionary. By omitting detection...

Stefan Kombrink, Mirko Hannemann, Lukas Burget, Hy

Real-time Traffic

Language Model | Oov Words | Rare Words | Signal Processing | TSD 2010 |

claim paper

Added	31 Jan 2011
Updated	31 Jan 2011
Type	Journal
Year	2010
Where	TSD
Authors	Stefan Kombrink, Mirko Hannemann, Lukas Burget, Hynek Hermansky

Sciweavers

Recovery of Rare Words in Lecture Speech

Language Model | Oov Words | Rare Words | Signal Processing | TSD 2010 |

Explore & Download

Productivity Tools

Sciweavers