In this work we concentrate on generating compound words with high order n-gram information for speech recognition. In most existing compound words generation methods, only bi-gra...
Speaker diarization of meeting recordings is generally based on acoustic information ignoring that meetings are instances of conversations. Several recent works have shown that th...
Fabio Valente, Deepu Vijayasenan, Petr Motlí...
The Web N-gram Workshop was held on July 23, 2010 in Geneva, Switzerland, in conjunction with the 33rd Annual ACM SIGIR Conference. The workshop brought together leaders in inform...
Chengxiang Zhai, Kuansan Wang, David Yarowsky, Ste...
We present three novel methods of compactly storing very large n-gram language models. These methods use substantially less space than all known approaches and allow n-gram probab...
In this paper, we systematically assess the value of using web-scale N-gram data in state-of-the-art supervised NLP classifiers. We compare classifiers that include or exclude fea...
For European languages, n-gram has proved to be the cost effective alternative to morphological processing during indexing task and it has been studied and analyzed extensively us...
Since only simple symbol-based manipulations are needed,n-gram indexingis used for naturallanguageswhere syntactic or semantic analyses are often difficult. Music, whose automatic...
Stemming can improve retrieval accuracy, but stemmers are language-specific. Character n-gram tokenization achieves many of the benefits of stemming in a language independent way,...
Our group in the Department of Informatics at the University of Oviedo has participated, for the first time, in two tasks at CLEF: monolingual (Russian) and bilingual (Spanish-to-E...
One commonly used approach for language recognition is to convert the input speech into a sequence of tokens such as words or phones and then to use these token sequences to deter...