Sciweavers

51 search results - page 8 / 11
» Automatic Filtering of Bilingual Corpora for Statistical Mac...
Sort
View
LREC
2008
114views Education» more  LREC 2008»
13 years 9 months ago
Improving Statistical Machine Translation Efficiency by Triangulation
In current phrase-based Statistical Machine Translation systems, more training data is generally better than less. However, a larger data set eventually introduces a larger model ...
Yu Chen, Andreas Eisele, Martin Kay
ACL
2010
13 years 5 months ago
Pseudo-Word for Phrase-Based Machine Translation
The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus. But word appears to be too fine-grained ...
Xiangyu Duan, Min Zhang, Haizhou Li
ACL
2008
13 years 9 months ago
Distributed Word Clustering for Large Scale Class-Based Language Modeling in Machine Translation
In statistical language modeling, one technique to reduce the problematic effects of data sparsity is to partition the vocabulary into equivalence classes. In this paper we invest...
Jakob Uszkoreit, Thorsten Brants
COLING
2010
13 years 2 months ago
An Empirical Study on Web Mining of Parallel Data
This paper1 presents an empirical approach to mining parallel corpora. Conventional approaches use a readily available collection of comparable, nonparallel corpora to extract par...
Gum-Won Hong, Chi-Ho Li, Ming Zhou, Hae-Chang Rim
ECIR
2010
Springer
13 years 9 months ago
Estimating Translation Probabilities from the Web for Structured Queries on CLIR
We present two methods for estimating replacement probabilities without using parallel corpora. The first method proposed exploits the possible translation probabilities latent in ...
Xabier Saralegi, Maddalen Lopez de Lacalle