Selection of Japanese-English Equivalents by Integrating High-quality Corpora and Huge Amounts of Web Data

15 years 8 months ago

Download www.lrec-conf.org

As a first step to developing systems that enable non-native speakers to output near-perfect English sentences for given mixed EnglishJapanese sentences, we propose new approaches for selecting English equivalents by using the number of hits for various contexts in large English corpora. As the large English corpora, we not only used the huge amounts of Web data but also the manually compiled large, high-quality English corpora. Using high-quality corpora enables us to accurately select equivalents, and using huge amounts of Web data enables us to resolve the problem of the shortage of hits that normally occurs when using only high-quality corpora. The types and lengths of contexts used to select equivalents are variable and optimally determined according to the number of hits in the corpora, so that performance can be further refined. Computer experiments showed that the precision of our methods was much higher than that of the existing methods for equivalent selection.

Qing Ma, Koichi Nakao, Masaki Murata, Hitoshi Isah

Real-time Traffic

Education | Huge Amounts | Large English Corpora | LREC 2008 | Near-perfect English Sentences |

claim paper

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	LREC
Authors	Qing Ma, Koichi Nakao, Masaki Murata, Hitoshi Isahara

Comments (0)

Sciweavers

Selection of Japanese-English Equivalents by Integrating High-quality Corpora and Huge Amounts of Web Data

Education | Huge Amounts | Large English Corpora | LREC 2008 | Near-perfect English Sentences |

Explore & Download

Productivity Tools

Sciweavers