Everybody loves a rich cousin: An empirical study of transliteration through bridge languages

13 years 10 months ago

Download www.cse.iitb.ac.in

Most state of the art approaches for machine transliteration are data driven and require significant parallel names corpora between languages. As a result, developing transliteration functionality among n languages could be a resource intensive task requiring parallel names corpora in the order of n C2. In this paper, we explore ways of reducing this high resource requirement by leveraging the available parallel data between subsets of the n languages, transitively. We propose, and show empirically, that reasonable quality transliteration engines may be developed between two languages, X and Y , even when no direct parallel names data exists between them, but only transitively through language Z. Such systems alleviate the need for O(n C2) corpora, significantly. In addition we show that the performance of such transitive transliteration systems is in par with direct transliteration systems, in practical applications, such as CLIR systems.

Mitesh M. Khapra, A. Kumaran, Pushpak Bhattacharyy

Real-time Traffic

Computational Linguistics | NAACL 2010 | Names Corpora | Parallel Names Corpora | Transliteration Systems |

claim paper

Post Info
More Details (n/a)

Added	14 Feb 2011
Updated	14 Feb 2011
Type	Journal
Year	2010
Where	NAACL
Authors	Mitesh M. Khapra, A. Kumaran, Pushpak Bhattacharyya

Comments (0)

Sciweavers

Everybody loves a rich cousin: An empirical study of transliteration through bridge languages

Computational Linguistics | NAACL 2010 | Names Corpora | Parallel Names Corpora | Transliteration Systems |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers