Abstract. This paper presents a language-independent Multilingual Document Clustering (MDC) approach on comparable corpora. Named entites (NEs) such as persons, locations, organiza...
This paper addresses the problem of mining named entity translations from comparable corpora, specifically, mining English and Chinese named entity translation. We first observe...
Jinhan Kim, Long Jiang, Seung-won Hwang, Young-In ...
This paper presents Multilingual Document Clustering (MDC) on comparable corpora. Wikipedia, a structured multilingual knowledge base, has been highly exploited in many monolingual...
We show that unseen words account for a large part of the translation error when moving to new domains. Using an extension of a recent approach to mining translations from compara...
We present a first known result of high precision rare word bilingual extraction from comparable corpora, using aligned comparable documents and supervised classification. We in...
In this paper, we explore a CLIR-based approach to construct large-scale Chinese-English comparable corpora, which is valuable for translation knowledge mining. The initial source...
In this paper, we address the problem of mining transliterations of Named Entities (NEs) from large comparable corpora. We leverage the empirical fact that multilingual news artic...
Raghavendra Udupa, K. Saravanan, A. Kumaran, Jagad...
Building NLG systems, in particular statistical ones, requires parallel data (paired inputs and outputs) which do not generally occur naturally. In this paper, we investigate the ...
Previous attempts at identifying translational equivalents in comparable corpora have dealt with very large `general language' corpora and words. We address this task in a sp...
Abstract. The paper proposes a method to improve the extraction of lowfrequency translation equivalents from comparable corpora. Prior to performing the mapping between vector spac...
Viktor Pekar, Ruslan Mitkov, Dimitar Blagoev, Andr...