Sciweavers

ACL
2012

ACCURAT Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora

12 years 2 months ago
ACCURAT Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora
The lack of parallel corpora and linguistic resources for many languages and domains is one of the major obstacles for the further advancement of automated translation. A possible solution is to exploit comparable corpora (non-parallel bi- or multi-lingual text resources) which are much more widely available than parallel translation data. Our presented toolkit deals with parallel content extraction from comparable corpora. It consists of tools bundled in two workflows: (1) alignment of comparable documents and extraction of parallel sentences and (2) extraction and bilingual mapping of terms and named entities. The toolkit pairs similar bilingual comparable documents and extracts parallel sentences and bilingual terminological and named entity dictionaries from comparable corpora. This demonstration focuses on the English, Latvian, Lithuanian, and Romanian languages.
Marcis Pinnis, Radu Ion, Dan Stefanescu, Fangzhong
Added 29 Sep 2012
Updated 29 Sep 2012
Type Journal
Year 2012
Where ACL
Authors Marcis Pinnis, Radu Ion, Dan Stefanescu, Fangzhong Su, Inguna Skadina, Andrejs Vasiljevs, Bogdan Babych
Comments (0)