Sciweavers

ICASSP
2011
IEEE

Using morpheme and syllable based sub-words for polish LVCSR

13 years 3 months ago
Using morpheme and syllable based sub-words for polish LVCSR
Polish is a synthetic language with a high morpheme-perword ratio. It makes use of a high degree of inflection leading to high out-of-vocabulary (OOV) rates, and high Language Model (LM) perplexities. This poses a challenge for Large Vocabulary and Continuous Speech Recognition (LVCSR) systems. Here, the use of morpheme and syllable based units is investigated for building sub-lexical LMs. A different type of sub-lexical units is proposed based on combining morphemic or syllabic units with corresponding pronunciations. Thereby, a set of grapheme-phoneme pairs called graphones are used for building LMs. A relative reduction of 3.5% in Word Error Rate (WER) is obtained with respect to a traditional system based on full-words.
M. Ali Basha Shaik, Amr El-Desoky Mousa, Ralf Schl
Added 20 Aug 2011
Updated 20 Aug 2011
Type Journal
Year 2011
Where ICASSP
Authors M. Ali Basha Shaik, Amr El-Desoky Mousa, Ralf Schlüter, Hermann Ney
Comments (0)