Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

170

IJCAI
2007

139views Artificial Intelligence» more IJCAI 2007»

Pseudo-Aligned Multilingual Corpora

15 years 8 months ago

Pseudo-Aligned Multilingual Corpora

Download www.ijcai.org

In machine translation, document alignment refers to ﬁnding correspondences between documents which are exact translations of each other. We deﬁne pseudo-alignment as the task of ﬁnding topical—as opposed to exact—correspondences between documents in different languages. We apply semisupervised methods to pseudo-align multilingual corpora. Speciﬁcally, we construct a topicbased graph for each language. Then, given exact correspondences between a subset of documents, we project the unaligned documents into a shared lower-dimensional space. We demonstrate that close documents in this lower-dimensional space tend to share the same topic. This has applications in machine translation and cross-lingual information analysis. Experimental results show that pseudo-alignment of multilingual corpora is feasible and that the document alignments produced are qualitatively sound. Our technique requires no linguistic knowledge of the corpus. On average when 10% of the corpus consists of ...

Fernando Diaz, Donald Metzler

Real-time Traffic

Artificial Intelligence | Document | Exact Correspondence | IJCAI 2007 | Multilingual Corpora |

claim paper

Related Content

» Learning Common Grammar from Multilingual Corpus

» Disentangling from Babylonian Confusion Unsupervised Language Identification

» Report on CLEF2003 Experiments Two Ways of Extracting Multilingual Resources from Corpora

» Exploiting Aligned Parallel Corpora in Multilingual Studies and Applications

» Leveraging Parallel Corpora and Existing Wordnets for Automatic Construction of the Sloven...

» Multilingual Extension of Temporal Expression Recognition Using Parallel Corpora

» MARS Multilingual Access and Retrieval System with Enhanced Query Translation and Document...

» Multilingual Document Clustering Using Wikipedia as External Knowledge

» MINT A Method for Effective and Scalable Mining of Named Entity Transliterations from Larg...

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2007
Where	IJCAI
Authors	Fernando Diaz, Donald Metzler

Comments (0)