This paper describes the CMU/InterACT effort in developing an Arabic Automatic Speech Recognition (ASR) system for broadcast news and conversations within the GALE 2006 evaluation...
We present an approach to using multiple preprocessing schemes to improve statistical word alignments. We show a relative reduction of alignment error rate of about 38%.
The aim of this work is to show the ability of finite-state transducers to simultaneously translate speech into multiple languages. Our proposal deals with an extension of stocha...
We evaluate semantic relatedness measures on different German datasets showing that their performance depends on: (i) the definition of relatedness that was underlying the constr...
We describe our linguistic rule-based tagger IceTagger, and compare its tagging accuracy to the TnT tagger, a state-of-theart statistical tagger, when tagging Icelandic, a morphol...
Customization to specific domains of discourse and/or user requirements is one of the greatest challenges for today’s Information Extraction (IE) systems. While demonstrably eff...
Training statistical models to detect nonnative sentences requires a large corpus of non-native writing samples, which is often not readily available. This paper examines the exte...
This paper describes an efficient method to extract large n-best lists from a word graph produced by a statistical machine translation system. The extraction is based on the k sh...