Current statistical machine translation systems usually extract rules from bilingual corpora annotated with 1-best alignments. They are prone to learn noisy rules due to alignment...
We describe a new pruning approach to remove phrase pairs from translation models of statistical machine translation systems. The approach applies the original translation system ...
We report about tools for the extraction of German multiword expressions (MWEs) from text corpora; we extract word pairs, but also longer MWEs of different patterns, e.g. verb-nou...
Web search is challenging partly due to the fact that search queries and Web documents use different language styles and vocabularies. This paper provides a quantitative analysis ...
This paper describes a new method for extracting open compounds (uninterrupted sequences of words) from text corpora of languages, such as Thai, Japanese and Korea that exhibit un...