We address the problem of extracting bilingual chunk pairs from parallel text to create training sets for statistical machine translation. We formulate the problem in terms of a s...
We present a novel framework for word alignment that incorporates synonym knowledge collected from monolingual linguistic resources in a bilingual probabilistic model. Synonym inf...
Abstract. Data with multi-valued categorical attributes can cause major problems for decision trees. The high branching factor can lead to data fragmentation, where decisions have ...
This paper describes an accurate and robust text alignment system for structurally different languages. Among structurally different languages such as Japanese and English, there ...
Ensemble methods that train multiple learners and then combine their predictions have been shown to be very effective in supervised learning. This paper explores ensemble methods ...