Identifying the most influential documents in a corpus is an important problem in many fields, from information science and historiography to text summarization and news aggregati...
Existing Language Identification (LID) approaches do reach 100% precision, in most common situations, when dealing with documents written in just one language, and when those docu...
Mining bilingual data (including bilingual sentences and terms1 ) from the Web can benefit many NLP applications, such as machine translation and cross language information retrie...
Long Jiang, Shiquan Yang, Ming Zhou, Xiaohua Liu, ...
Techniques for learning from data typically require data to be in standard form. Measurements must be encoded in a numerical format such as binary true-or-false features, numerica...
V. Seshadri, Raguram Sasisekharan, Sholom M. Weiss
We introduce, analyze and demonstrate a recursive hierarchical generalization of the widely used hidden Markov models, which we name Hierarchical Hidden Markov Models (HHMM). Our m...