Web-based acquisition of Japanese katakana variants

16 years 6 days ago

Download www.r.dl.itc.u-tokyo.ac.jp

This paper describes a method of detecting Japanese Katakana variants from a large corpus. Katakana words, which are mainly used as loanwords, cause problems with information retrieval and so on, because transliteration creates several variations in spelling and all of these can be orthographic. Previous work manually deﬁned Katakana rewrite rules such as (be) and (ve), for generating variants and also deﬁned the weight of each operation to edit one string into another to detect these variants. However, this research has not been able to keep up with the ever-increasing number of loanwords and their variants. With our method proposed in this paper, the weight of each edit operation is mechanically assigned based on Web data. In experiments, it performed almost as well as one with manually determined weights. It also achieved 98.6% recall and 86.3% precision in the task of extracting Katakana variant pairs from a 38-year corpus of Japanese newspaper articles. Categories and Subject...

Takeshi Masuyama, Hiroshi Nakagawa

Real-time Traffic

Japanese Katakana Variants | Katakana | Katakana Variants | SIGIR 2005 |

claim paper

Post Info
More Details (n/a)

Added	26 Jun 2010
Updated	26 Jun 2010
Type	Conference
Year	2005
Where	SIGIR
Authors	Takeshi Masuyama, Hiroshi Nakagawa

Comments (0)

Sciweavers

Web-based acquisition of Japanese katakana variants

Japanese Katakana Variants | Katakana | Katakana Variants | SIGIR 2005 |

Explore & Download

Productivity Tools

Sciweavers