: This article describes a multilayer model-based approach for text compression. It uses linguistic information to develop a multilayer decomposition model of the text in order to ...
In this paper, we propose a new Bayesian model for fully unsupervised word segmentation and an efficient blocked Gibbs sampler combined with dynamic programming for inference. Our...
Multilingual parallel text corpora provide a powerful means for propagating linguistic knowledge across languages. We present a model which jointly learns linguistic structure for...
We propose a simple two-level hierarchical probability model for unsupervised word segmentation. By treating words as strings composed of morphemes/phonemes which are themselves c...
Developing better methods for segmenting continuous text into words is important for improving the processing of Asian languages, and may shed light on how humans learn to segment...
Sharon Goldwater, Thomas L. Griffiths, Mark Johnso...