Sciweavers

IAJIT
2011

Improving the accuracy of English-Arabic statistical sentence alignment

13 years 6 months ago
Improving the accuracy of English-Arabic statistical sentence alignment
: Multilingual natural language processing systems are increasingly relying on parallel corpus to ameliorate their output. Parallel corpora constitute the basic block for training a statistical natural language processing system and creating translation and language models. Several systems have been devised that automatically align words of a pair of sentences, each in a language. Such systems have been used successfully with European languages. In this paper, one such system is used to align sentences in an English-Arabic corpus. The system works poorly given raw unaligned sentence English-Arabic sentence pairs. This prompted the development of a preprocessing step to be applied to the Arabic sentences. The same corpus was then preprocessed and a significant improvement is reported when alignment is attempted using the preprocessed unaligned sentences.
Mohammad Salameh, Rached Zantout, Nashat Mansour
Added 14 May 2011
Updated 14 May 2011
Type Journal
Year 2011
Where IAJIT
Authors Mohammad Salameh, Rached Zantout, Nashat Mansour
Comments (0)