

Discriminative Sample Selection for Statistical Machine Translation

13 years 10 months ago
Discriminative Sample Selection for Statistical Machine Translation
Production of parallel training corpora for the development of statistical machine translation (SMT) systems for resource-poor languages usually requires extensive manual effort. Active sample selection aims to reduce the labor, time, and expense incurred in producing such resources, attaining a given performance benchmark with the smallest possible training corpus by choosing informative, nonredundant source sentences from an available candidate pool for manual translation. We present a novel, discriminative sample selection strategy that preferentially selects batches of candidate sentences with constructs that lead to erroneous translations on a held-out development set. The proposed strategy supports a built-in diversity mechanism that reduces redundancy in the selected batches. Simulation experiments on English-to-Pashto and Spanish-to-English translation tasks demonstrate the superiority of the proposed approach to a number of competing techniques, such as random selection, diss...
Sankaranarayanan Ananthakrishnan, Rohit Prasad, Da
Added 11 Feb 2011
Updated 11 Feb 2011
Type Journal
Year 2010
Authors Sankaranarayanan Ananthakrishnan, Rohit Prasad, David Stallard, Prem Natarajan
Comments (0)