Sciweavers

ACL
2007

Corpus Effects on the Evaluation of Automated Transliteration Systems

14 years 1 months ago
Corpus Effects on the Evaluation of Automated Transliteration Systems
Most current machine transliteration systems employ a corpus of known sourcetarget word pairs to train their system, and typically evaluate their systems on a similar corpus. In this paper we explore the performance of transliteration systems on corpora that are varied in a controlled way. In particular, we control the number, and prior language knowledge of human transliterators used to construct the corpora, and the origin of the source words that make up the corpora. We find that the word accuracy of automated transliteration systems can vary by up to 30% (in absolute terms) depending on the corpus on which they are run. We conclude that at least four human transliterators should be used to construct corpora for evaluating automated transliteration systems; and that although absolute word accuracy metrics may not translate across corpora, the relative rankings of system performance remains stable across differing corpora.
Sarvnaz Karimi, Andrew Turpin, Falk Scholer
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2007
Where ACL
Authors Sarvnaz Karimi, Andrew Turpin, Falk Scholer
Comments (0)