Learning the lexicon from raw texts for open-vocabulary Korean word recognition

14 years 5 months ago

Download www.cse.salford.ac.uk

In this paper, we propose a novel method of building a language model for open-vocabulary Korean word recognition. Due to the complex morphology of Korean, it is inappropriate to use lexicons based on the linguistic entities such as words and morphemes in openvocabulary domains. Instead, we build the lexicon by collecting variable length character sequences from the raw texts using a dynamic Bayesian network model of the language. In simulated word recognition experiments, the proposed language model could find correct words from lattices of character candidates in 94.3% of cases, increasing the word recognition rates by 20.9%.

Sungho Ryu, Jin Hyung Kim

Real-time Traffic

Document Analysis | ICDAR 2003 | Korean Word Recognition | Language Model | Word Recognition |

claim paper

Post Info
More Details (n/a)

Added	04 Jul 2010
Updated	04 Jul 2010
Type	Conference
Year	2003
Where	ICDAR
Authors	Sungho Ryu, Jin Hyung Kim

Comments (0)

Sciweavers

Learning the lexicon from raw texts for open-vocabulary Korean word recognition

Document Analysis | ICDAR 2003 | Korean Word Recognition | Language Model | Word Recognition |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers