This paper describes two methods for detecting word segments and their morphological information in a Japanese spontaneous speech corpus, and describes how to tag a large spontane...
LC-STAR II is a follow-up project of the EU funded project LC-STAR (Lexica and Corpora for Speech-to-Speech Translation Components, IST-2001-32216). LC-STAR II develops large lexi...
Ute Ziegenhain, Hanne Fersoe, Henk van den Heuvel,...
Abstract—According to characteristics of Mongolian wordformation, a method for removing inflectional suffixes from word images of the Mongolian Kanjur is proposed in this paper. ...
Abstract. Katakana, Japanese phonogram mainly used for loan words, is a troublemaker in Japanese word segmentation. Since Katakana words are heavily domaindependent and there are m...
"Word" is difficult to define in the languages that do not exhibit explicit word boundary, such as Thai. Traditional methods on defining words for this kind of languages...