Contextual Dependencies in Unsupervised Word Segmentation

15 years 8 months ago

Download cocosci.berkeley.edu

Developing better methods for segmenting continuous text into words is important for improving the processing of Asian languages, and may shed light on how humans learn to segment speech. We propose two new Bayesian word segmentation methods that assume unigram and bigram models of word dependencies respectively. The bigram model greatly outperforms the unigram model (and previous probabilistic models), demonstrating the importance of such dependencies for word segmentation. We also show that previous probabilistic models rely crucially on suboptimal search procedures.

Sharon Goldwater, Thomas L. Griffiths, Mark Johnso

Real-time Traffic

ACL 2006 | ACL 2007 | Bigram Model | Word Segmentation | Word Segmentation Methods |

claim paper

» Linguistically Motivated Unsupervised Segmentation for Machine Translation

» Unsupervised Discovery of Compound Entities for Relationship Extraction

» Evaluation of ContextDependent Phrasal Translation Lexicons for Statistical Machine Transl...

» Morphological Analysis by Multiple Sequence Alignment

Post Info
More Details (n/a)

Added	30 Oct 2010
Updated	30 Oct 2010
Type	Conference
Year	2006
Where	ACL
Authors	Sharon Goldwater, Thomas L. Griffiths, Mark Johnson

Comments (0)

Sciweavers

Contextual Dependencies in Unsupervised Word Segmentation

ACL 2006 | ACL 2007 | Bigram Model | Word Segmentation | Word Segmentation Methods |

Explore & Download

Productivity Tools

Sciweavers