Automatically Harvesting Katakana-English Term Pairs from Search Engine Query Logs

14 years 4 months ago

Download research.microsoft.com

This paper describes a method of extracting katakana words and phrases, along with their English counterparts from non-aligned monolingual web search engine query logs. The method employs a trainable edit distance function to find <katakana, English> pairs that have a high probability of being equivalent. These pairs can then be used to further bootstrap training of the edit distance function, resulting in improved back-transliteration from katakana to English. In addition, this is an effective method for mining large numbers of katakana strings to enhance a bilingual lexicon. The improved edit distance function and enhanced lexicon can be used for more accurate alignment of bitexts, and for application during runtime MT and multilingual IR.

Eric Brill, Gary Kacmarcik, Chris Brockett

Real-time Traffic

Edit Distance Function | Katakana | Natural Language Processing | NLPRS 2001 | Trainable Edit Distance |

claim paper

Post Info
More Details (n/a)

Added	30 Jul 2010
Updated	30 Jul 2010
Type	Conference
Year	2001
Where	NLPRS
Authors	Eric Brill, Gary Kacmarcik, Chris Brockett

Comments (0)

Sciweavers

Automatically Harvesting Katakana-English Term Pairs from Search Engine Query Logs

Edit Distance Function | Katakana | Natural Language Processing | NLPRS 2001 | Trainable Edit Distance |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers