Efficient processing of tera-scale text data is an important research topic. This paper proposes lossless compression of Ngram language models based on LOUDS, a succinct data stru...
In this paper, we propose a new Bayesian model for fully unsupervised word segmentation and an efficient blocked Gibbs sampler combined with dynamic programming for inference. Our...
Implicit discourse relation recognition is difficult due to the absence of explicit discourse connectives between arbitrary spans of text. In this paper, we use language models to...
Zhi Min Zhou, Man Lan, Zhengyu Niu, Yu Xu, Jian Su
We present results from a range of experiments on article and preposition error correction for non-native speakers of English. We first compare a language model and errorspecific ...
In recent years, with the rapid proliferation of digital images, the need to search and retrieve the images accurately, efficiently, and conveniently is becoming more acute. Automa...
In this paper, we present a novel approach for authorship attribution, the task of identifying the author of a document, using probabilistic context-free grammars. Our approach in...
Sindhu Raghavan, Adriana Kovashka, Raymond J. Moon...
The vocabulary used in speech usually consists of two types of words: a limited set of common words, shared across multiple documents, and a virtually unlimited set of rare words, ...
Stefan Kombrink, Mirko Hannemann, Lukas Burget, Hy...
This paper presents a unified approach to Chinese statistical language modeling (SLM). Applying SLM techniques like trigram language models to Chinese is challenging because (1) t...
In this paper, we explored how to use meta-data information in information retrieval task. We presented a new language model that is able to take advantage of the category informa...
Rong Jin, Luo Si, Alexander G. Hauptmann, James P....
A new language model for speech recognition inspired by linguistic analysis is presented. The model develops hidden hierarchical structure incrementally and uses it to extract mea...