The merit of phrase-based statistical machine translation is often reduced by the complexity to construct it. In this paper, we address some issues in phrase-based statistical mach...
Hendra Setiawan, Haizhou Li, Min Zhang, Beng Chin ...
Compounded words are a challenge for NLP applications such as machine translation (MT). We introduce methods to learn splitting rules from monolingual and parallel corpora. We eva...
Obtaining high-quality machine translations is still a long way off. A postediting phase is required to improve the output of a machine translation system. An alternative is the s...
The paper presents Bulgarian National Corpus project (BulNC) - a large-scale, representative, online available corpus of Bulgarian. The BulNC is also a monolingual general corpus,...
Bayesian approaches have been shown to reduce the amount of overfitting that occurs when running the EM algorithm, by placing prior probabilities on the model parameters. We appl...