Sentence Alignment of Hungarian-English Parallel Corpora Using a Hybrid Algorithm

15 years 6 months ago

Download www.inf.u-szeged.hu

We present an ecient hybrid method for aligning sentences with their translations in a parallel bilingual corpus. The new algorithm is composed of a length-based and anchor matching method that uses Named Entity recognition. This algorithm combines the speed of length-based models with the accuracy of anchor nding methods. The accuracy of nding cognates for Hungarian-English language pair is extremely low, hence we thought of using a novel approach that includes Named Entity recognition. Due to the well selected anchors it was found to outperform the best two sentence alignment algorithms so far published for the Hungarian-English language pair. Key words: sentence segmentation, sentence alignment, hybrid method, lengthbased alignment, Named Entity recognition, anchor, cognates, dynamic programming

Krisztina Tóth, Richárd Farkas, Andr

Real-time Traffic

ACTAC 2008 | Entity Recognition | Hungarian-English Language Pair | Hybrid Method |

claim paper

» Collocation Extraction Using Monolingual Word Alignment Method

» Bitext Correspondences through Rich Markup

» Incorporating Linguistic Information to Statistical WordLevel Alignment

Post Info
More Details (n/a)

Added	08 Dec 2010
Updated	08 Dec 2010
Type	Journal
Year	2008
Where	ACTAC
Authors	Krisztina Tóth, Richárd Farkas, András Kocsor

Comments (0)

Sciweavers

Sentence Alignment of Hungarian-English Parallel Corpora Using a Hybrid Algorithm

ACTAC 2008 | Entity Recognition | Hungarian-English Language Pair | Hybrid Method |

Explore & Download

Productivity Tools

Sciweavers