Sciweavers

NLPRS
2001
Springer

Automatically Harvesting Katakana-English Term Pairs from Search Engine Query Logs

14 years 4 months ago
Automatically Harvesting Katakana-English Term Pairs from Search Engine Query Logs
This paper describes a method of extracting katakana words and phrases, along with their English counterparts from non-aligned monolingual web search engine query logs. The method employs a trainable edit distance function to find <katakana, English> pairs that have a high probability of being equivalent. These pairs can then be used to further bootstrap training of the edit distance function, resulting in improved back-transliteration from katakana to English. In addition, this is an effective method for mining large numbers of katakana strings to enhance a bilingual lexicon. The improved edit distance function and enhanced lexicon can be used for more accurate alignment of bitexts, and for application during runtime MT and multilingual IR.
Eric Brill, Gary Kacmarcik, Chris Brockett
Added 30 Jul 2010
Updated 30 Jul 2010
Type Conference
Year 2001
Where NLPRS
Authors Eric Brill, Gary Kacmarcik, Chris Brockett
Comments (0)