Training a statistical machine translation starts with tokenizing a parallel corpus. Some languages such as Chinese do not incorporate spacing in their writing system, which creat...
Searching online information is increasingly a daily activity for many people. The multilinguality of online content is also increasing (e.g. the proportion of English web users, ...
Yaser Al-Onaizan, Radu Florian, Martin Franz, Hany...
Little work to date in sentiment analysis (classifying texts by ‘positive’ or ‘negative’ orientation) has attempted to use fine-grained semantic distinctions in features ...
This paper describes the participation of DAEDALUS at the LogCLEF task. The focus of our experiments was to study if the difference between the native language of the user and the ...
In this article, we are studying the differences between the European languages using statistical and unsupervised methods. The analysis is conducted in different levels of languag...
Kimmo Kettunen, Markus Sadeniemi, Tiina Lindh-Knuu...