Automatic Acquisition of Basic Katakana Lexicon from a Given Corpus

16 years 12 days ago

Download nlp.kuee.kyoto-u.ac.jp

Abstract. Katakana, Japanese phonogram mainly used for loan words, is a troublemaker in Japanese word segmentation. Since Katakana words are heavily domaindependent and there are many Katakana neologisms, it is almost impossible to construct and maintain Katakana word dictionary by hand. This paper proposes an automatic segmentation method of Japanese Katakana compounds, which makes it possible to construct precise and concise Katakana word dictionary automatically, given only a medium or large size of Japanese corpus of some domain.

Toshiaki Nakazawa, Daisuke Kawahara, Sadao Kurohas

Real-time Traffic

IJCNLP 2005 | Japanese Katakana Compounds | Katakana Neologisms | Katakana Word | Natural Language Processing |

claim paper

Added	27 Jun 2010
Updated	27 Jun 2010
Type	Conference
Year	2005
Where	IJCNLP
Authors	Toshiaki Nakazawa, Daisuke Kawahara, Sadao Kurohashi

Sciweavers

Automatic Acquisition of Basic Katakana Lexicon from a Given Corpus

IJCNLP 2005 | Japanese Katakana Compounds | Katakana Neologisms | Katakana Word | Natural Language Processing |

Explore & Download

Productivity Tools

Sciweavers