Sciweavers

ACL
2007

Vocabulary Decomposition for Estonian Open Vocabulary Speech Recognition

14 years 1 months ago
Vocabulary Decomposition for Estonian Open Vocabulary Speech Recognition
Speech recognition in many morphologically rich languages suffers from a very high out-of-vocabulary (OOV) ratio. Earlier work has shown that vocabulary decomposition methods can practically solve this problem for a subset of these languages. This paper compares various vocabulary decomposition approaches to open vocabulary speech recognition, using Estonian speech recognition as a benchmark. Comparisons are performed utilizing large models of 60000 lexical items and smaller vocabularies of 5000 items. A large vocabulary model based on a manually constructed morphological tagger is shown to give the lowest word error rate, while the unsupervised morphology discovery method Morfessor Baseline gives marginally weaker results. Only the Morfessor-based approach is shown to adequately scale to smaller vocabulary sizes.
Antti Puurula, Mikko Kurimo
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2007
Where ACL
Authors Antti Puurula, Mikko Kurimo
Comments (0)