Abstract. We show several PAC-style concentration bounds for learning unigrams language model. One interesting quantity is the probability of all words appearing exactly k times in...
In this work we study some probabilistic models for the random generation of words over a given alphabet used in the literature in connection with pattern statistics. Our goal is t...
The process of labeling each word in a sentence with one of its lexical categories (noun, verb, etc) is called tagging and is a key step in parsing and many other language processi...
We show that the standard beam-search algorithm can be used as an efficient decoder for the global linear model of Zhang and Clark (2008) for joint word segmentation and POS-taggi...
We present in this paper a new learning problem called learning distributions from experts. In the case we study the experts are stochastic deterministic finite automata (sdfa). W...