This work presents improvements of a large-scale Arabic to French statistical machine translation system over a period of three years. The development includes better preprocessin...
Cross-language information retrieval (CLIR) today is dominated by techniques that use token-to-token mappings from bilingual dictionaries. Yet, state-of-the-art statistical transl...
High dimensional structured data such as text and images is often poorly understood and misrepresented in statistical modeling. The standard histogram representation suffers from ...
Model M, a novel class-based exponential language model, has been shown to significantly outperform word n-gram models in state-of-the-art machine translation and speech recognit...
We present a novel method for inducing synchronous context free grammars (SCFGs) from a corpus of parallel string pairs. SCFGs can model equivalence between strings in terms of su...