Much of the information on the Web is found in articles from online news outlets, magazines, encyclopedias, review collections, and other sources. However, extracting this content...
Cross-language latent semantic indexing is a method that learns useful languageindependent vector representations of terms through a statistical analysis of a documentaligned text...
Translation of proper names is generally recognized as a significant problem in many multi-lingual text and speech processing applications. Even when large bilingual lexicons use...
Statistical Machine Translation (SMT) is based on alignment models which learn from bilingual corpora the word correspondences between source and target language. These models are...
High dimensional structured data such as text and images is often poorly understood and misrepresented in statistical modeling. The standard histogram representation suffers from ...