Mining comparable bilingual text corpora for cross-language information integration

16 years 7 months ago

Download sifaka.cs.uiuc.edu

Integrating information in multiple natural languages is a challenging task that often requires manually created linguistic resources such as a bilingual dictionary or examples of direct translations of text. In this paper, we propose a general cross-lingual text mining method that does not rely on any of these resources, but can exploit comparable bilingual text corpora to discover mappings between words and documents in different languages. Comparable text corpora are collections of text documents in different languages that are about similar topics; such text corpora are often naturally available (e.g., news articles in different languages published in the same time period). The main idea of our method is to exploit frequency correlations of words in different languages in the comparable corpora and discover mappings between words in different languages. Such mappings can then be used to further discover mappings between documents in different languages, achieving cross-lingual inf...

Tao Tao, ChengXiang Zhai

Real-time Traffic

Available Comparable Corpora | Comparable Bilingual Text | Comparable Text Corpora | Data Mining | KDD 2005 |

claim paper

» Mining entity translations from comparable corpora a holistic graph mapping approach

» LINNAEUS A species name identification system for biomedical literature

Post Info
More Details (n/a)

Added	30 Nov 2009
Updated	30 Nov 2009
Type	Conference
Year	2005
Where	KDD
Authors	Tao Tao, ChengXiang Zhai

Comments (0)

Sciweavers

Mining comparable bilingual text corpora for cross-language information integration

Available Comparable Corpora | Comparable Bilingual Text | Comparable Text Corpora | Data Mining | KDD 2005 |

Explore & Download

Productivity Tools

Sciweavers