Word Segmentation of Vietnamese Texts: a Comparison of Approaches

15 years 8 months ago

Download www.lrec-conf.org

We present in this paper a comparison between three segmentation systems for the Vietnamese language. Indeed, the majority of Vietnamese words is built by semantic composition from about 7,000 syllables, that also have a meaning as isolated words. So the identification of word boundaries in a text is not a simple task, and ambiguities often appear. Beyond the presentation of the tested systems, we also propose a standard definition for word segmentation in Vietnamese, and introduce a reference corpus developed for the purpose of evaluating such a task. The results observed confirm that it can be relatively well treated by automatic means, although a solution needs to be found to take into account out-of-vocabulary words.

Quang Thang Dinh, Hong Phuong Le, Thi Minh Huyen N

Real-time Traffic

Education | LREC 2008 | Segmentation Systems | Vietnamese Language | Vietnamese Words |

claim paper

» Vietnamese Word Segmentation

» Wordbased and Characterbased Word Segmentation Models Comparison and Combination

» StatisticalBased Approach to Word Segmentation

» Incremental Joint Approach to Word Segmentation POS Tagging and Dependency Parsing in Chin...

» Enhancing Domain Portability of Chinese Segmentation Model Using ChiSquare Statistics and ...

» A Novel Word Segmentation Approach for Written Languages with Word Boundary Markers

» Comparison of Some Thresholding Algorithms for TextBackground Segmentation in Difficult Do...

» Japanese Word Segmentation by Hidden Markov Model

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	LREC
Authors	Quang Thang Dinh, Hong Phuong Le, Thi Minh Huyen Nguyen, Cam-Tu Nguyen, Mathias Rossignol, Xuân Luong Vu

Comments (0)

Sciweavers

Word Segmentation of Vietnamese Texts: a Comparison of Approaches

Education | LREC 2008 | Segmentation Systems | Vietnamese Language | Vietnamese Words |

Explore & Download

Productivity Tools

Sciweavers