Exploiting Comparable Corpora and Bilingual Dictionaries for Cross-Language Text Categorization

15 years 8 months ago

Download acl.ldc.upenn.edu

Cross-language Text Categorization is the task of assigning semantic classes to documents written in a target language (e.g. English) while the system is trained using labeled documents in a source language (e.g. Italian). In this work we present many solutions according to the availability of bilingual resources, and we show that it is possible to deal with the problem even when no such resources are accessible. The core technique relies on the automatic acquisition of Multilingual Domain Models from comparable corpora. Experiments show the effectiveness of our approach, providing a low cost solution for the Cross Language Text Categorization task. In particular, when bilingual dictionaries are available the performance of the categorization gets close to that of monolingual text categorization.

Alfio Massimiliano Gliozzo, Carlo Strapparava

Real-time Traffic

ACL 2006 | ACL 2007 | Cross-language Text Categorization | Text Categorization | Text Categorization Task |

claim paper

» Mining comparable bilingual text corpora for crosslanguage information integration

» Multilingual document clusters discovery

Post Info
More Details (n/a)

Added	30 Oct 2010
Updated	30 Oct 2010
Type	Conference
Year	2006
Where	ACL
Authors	Alfio Massimiliano Gliozzo, Carlo Strapparava

Comments (0)

Sciweavers

Exploiting Comparable Corpora and Bilingual Dictionaries for Cross-Language Text Categorization

ACL 2006 | ACL 2007 | Cross-language Text Categorization | Text Categorization | Text Categorization Task |

Explore & Download

Productivity Tools

Sciweavers