Bilingual Text Classification using the IBM 1 Translation Model

15 years 8 months ago

Download www.lrec-conf.org

Manual categorisation of documents is a time-consuming task that has been significantly alleviated with the deployment of automatic and machine-aided text categorisation systems. However, the proliferation of multilingual documentation has become a common phenomenon in many international organisations, while most of the current systems has focused on the categorisation of monolingual text. It has been recently shown that the inherent redundancy in bilingual documents can be effectively exploited by relatively simple, bilingual naive Bayes (multinomial) models. In this work, we present a refined version of these models in which this redundancy is explicitly captured by a combination of a unigram (multinomial) model and the well-known IBM 1 translation model. The proposed model is evaluated on two bilingual classification tasks and compared to previous work.

Jorge Civera, Alfons Juan-Císcar

Real-time Traffic

Categorisation | Education | LREC 2008 | Machine-aided Text Categorisation | Manual Categorisation |

claim paper

» Structural Feature Selection For EnglishKorean Statistical Machine Translation

» CrossLanguage Frame Semantics Transfer in Bilingual Corpora

» Learning to Predict Case Markers in Japanese

» Crosslingual Semantic Relatedness Using Encyclopedic Knowledge

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	LREC
Authors	Jorge Civera, Alfons Juan-Císcar

Comments (0)

Sciweavers

Bilingual Text Classification using the IBM 1 Translation Model

Categorisation | Education | LREC 2008 | Machine-aided Text Categorisation | Manual Categorisation |

Explore & Download

Productivity Tools

Sciweavers