Almost all Chinese language processing tasks involve word segmentation of the language input as their first steps, thus robust and reliable segmentation techniques are always requ...
In this paper, a new language model, the Multi-Class Composite N-gram, is proposed to avoid a data sparseness problem for spoken language in that it is difficult to collect traini...
Abstract. The identification of reliable and interesting items on Internet becomes more and more difficult and time consuming. This paper is a position paper describing our intend...
This paper presents a method for improving phrase-based Statistical Machine Translation systems by enriching the original translation model with information derived from a multilin...
Statistical machine translation (SMT) models require bilingual corpora for training, and these corpora are often multilingual with parallel text in multiple languages simultaneous...