Sciweavers

COLING
2010

Large Scale Parallel Document Mining for Machine Translation

13 years 6 months ago
Large Scale Parallel Document Mining for Machine Translation
A distributed system is described that reliably mines parallel text from large corpora. The approach can be regarded as cross-language near-duplicate detection, enabled by an initial, low-quality batch translation. In contrast to other approaches which require specialized metadata, the system uses only the textual content of the documents. Results are presented for a corpus of over two billion web pages and for a large collection of digitized public-domain books.
Jakob Uszkoreit, Jay Ponte, Ashok C. Popat, Moshe
Added 13 May 2011
Updated 13 May 2011
Type Journal
Year 2010
Where COLING
Authors Jakob Uszkoreit, Jay Ponte, Ashok C. Popat, Moshe Dubiner
Comments (0)