Corpus Effects on the Evaluation of Automated Transliteration Systems

14 years 1 months ago

Download aclweb.org

Most current machine transliteration systems employ a corpus of known sourcetarget word pairs to train their system, and typically evaluate their systems on a similar corpus. In this paper we explore the performance of transliteration systems on corpora that are varied in a controlled way. In particular, we control the number, and prior language knowledge of human transliterators used to construct the corpora, and the origin of the source words that make up the corpora. We ﬁnd that the word accuracy of automated transliteration systems can vary by up to 30% (in absolute terms) depending on the corpus on which they are run. We conclude that at least four human transliterators should be used to construct corpora for evaluating automated transliteration systems; and that although absolute word accuracy metrics may not translate across corpora, the relative rankings of system performance remains stable across differing corpora.

Sarvnaz Karimi, Andrew Turpin, Falk Scholer

Real-time Traffic

ACL 2007 | Computational Linguistics | Human Transliterators | Machine Transliteration Systems | Transliteration Systems |

claim paper

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2007
Where	ACL
Authors	Sarvnaz Karimi, Andrew Turpin, Falk Scholer

Comments (0)

Sciweavers

Corpus Effects on the Evaluation of Automated Transliteration Systems

ACL 2007 | Computational Linguistics | Human Transliterators | Machine Transliteration Systems | Transliteration Systems |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers