Reliable Measures for Aligning Japanese-English News Articles and Sentences

14 years 2 months ago

Download acl.ldc.upenn.edu

We have aligned Japanese and English news articles and sentences to make a large parallel corpus. We ﬁrst used a method based on cross-language information retrieval (CLIR) to align the Japanese and English articles and then used a method based on dynamic programming (DP) matching to align the Japanese and English sentences in these articles. However, the results included many incorrect alignments. To remove these, we propose two measures (scores) that evaluate the validity of alignments. The measure for article alignment uses similarities in sentences aligned by DP matching and that for sentence alignment uses similarities in articles aligned by CLIR. They enhance each other to improve the accuracy of alignment. Using these measures, we have successfully constructed a largescale article and sentence alignment corpus available to the public.

Masao Utiyama, Hitoshi Isahara

Real-time Traffic

ACL 2003 | ACL 2007 | Alignment Uses Similarities | Large Parallel Corpus | Sentence Alignment |

claim paper

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2003
Where	ACL
Authors	Masao Utiyama, Hitoshi Isahara

Comments (0)

Sciweavers

Reliable Measures for Aligning Japanese-English News Articles and Sentences

ACL 2003 | ACL 2007 | Alignment Uses Similarities | Large Parallel Corpus | Sentence Alignment |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers