Sciweavers

COLING
2010

An Empirical Study on Web Mining of Parallel Data

13 years 7 months ago
An Empirical Study on Web Mining of Parallel Data
This paper1 presents an empirical approach to mining parallel corpora. Conventional approaches use a readily available collection of comparable, nonparallel corpora to extract parallel sentences. This paper attempts the much more challenging task of directly searching for high-quality sentence pairs from the Web. We tackle the problem by formulating good search query using ,,Learning to Rank and by filtering noisy document pairs using IBM Model 1 alignment. End-to-end evaluation shows that the proposed approach significantly improves the performance of statistical machine translation.
Gum-Won Hong, Chi-Ho Li, Ming Zhou, Hae-Chang Rim
Added 13 May 2011
Updated 13 May 2011
Type Journal
Year 2010
Where COLING
Authors Gum-Won Hong, Chi-Ho Li, Ming Zhou, Hae-Chang Rim
Comments (0)