MINT: A Method for Effective and Scalable Mining of Named Entity Transliterations from Large Comparable Corpora

15 years 4 months ago

Download research.microsoft.com

In this paper, we address the problem of mining transliterations of Named Entities (NEs) from large comparable corpora. We leverage the empirical fact that multilingual news articles with similar news content are rich in Named Entity Transliteration Equivalents (NETEs). Our mining algorithm, MINT, uses a cross-language document similarity model to align multilingual news articles and then mines NETEs from the aligned articles using a transliteration similarity model. We show that our approach is highly effective on 6 different comparable corpora between English and 4 languages from 3 different language families. Furthermore, it performs substantially better than a state-of-the-art competitor.

Raghavendra Udupa, K. Saravanan, A. Kumaran, Jagad

Real-time Traffic

Comparable Corpora | EACL 2009 | Large Comparable Corpora | Natural Language Processing | Similarity Model |

claim paper

» Entity discovery and assignment for opinion mining applications

» Scaling up text classification for large file systems

Post Info
More Details (n/a)

Added	17 Feb 2011
Updated	17 Feb 2011
Type	Journal
Year	2009
Where	EACL
Authors	Raghavendra Udupa, K. Saravanan, A. Kumaran, Jagadeesh Jagarlamudi

Comments (0)

Sciweavers

MINT: A Method for Effective and Scalable Mining of Named Entity Transliterations from Large Comparable Corpora

Comparable Corpora | EACL 2009 | Large Comparable Corpora | Natural Language Processing | Similarity Model |

Explore & Download

Productivity Tools

Sciweavers