An Unsupervised Learning and Statistical Approach for Vietnamese Word Recognition and Segmentation

16 years 28 days ago

Download stubber.math-inf.uni-greifswald.de

There are two main topics in this paper: (i) Vietnamese words are recognized and sentences are segmented into words by using probabilistic models; (ii) the optimum probabilistic model is constructed by an unsupervised learning processing. For each probabilistic model, new words are recognized and their syllables are linked together. The syllable-linking process improves the accuracy of statistical functions which improves contrarily the new words recognition. Hence, the probabilistic model will converge to the optimum one. Our experimented corpus is generated from about 250.000 online news articles, which consist of about 19.000.000 sentences. The accuracy of the segmented algorithm is over 90%. Our Vietnamese word and phrase dictionary contains more than 150.000 elements.

Hieu Le Trung, Vu Le Anh, Kien Le Trung

Real-time Traffic

ACIIDS 2010 | Database | Optimum Probabilistic Model | Probabilistic Model | Vietnamese Word |

claim paper

Post Info
More Details (n/a)

Added	10 Jul 2010
Updated	10 Jul 2010
Type	Conference
Year	2010
Where	ACIIDS
Authors	Hieu Le Trung, Vu Le Anh, Kien Le Trung

Comments (0)

Sciweavers

An Unsupervised Learning and Statistical Approach for Vietnamese Word Recognition and Segmentation

ACIIDS 2010 | Database | Optimum Probabilistic Model | Probabilistic Model | Vietnamese Word |

Explore & Download

Productivity Tools

Sciweavers