Unsupervised Multilingual Learning for POS Tagging

15 years 3 months ago

Download people.csail.mit.edu

We demonstrate the effectiveness of multilingual learning for unsupervised part-of-speech tagging. The key hypothesis of multilingual learning is that by combining cues from multiple languages, the structure of each becomes more apparent. We formulate a hierarchical Bayesian model for jointly predicting bilingual streams of part-of-speech tags. The model learns language-specific features while capturing cross-lingual patterns in tag distribution for aligned words. Once the parameters of our model have been learned on bilingual parallel data, we evaluate its performance on a held-out monolingual test set. Our evaluation on six pairs of languages shows consistent and significant performance gains over a state-of-the-art monolingual baseline. For one language pair, we observe a relative reduction in error of 53%.

Benjamin Snyder, Tahira Naseem, Jacob Eisenstein,

Real-time Traffic

EMNLP 2008 | Hierarchical Bayesian Model | Multilingual Learning | Natural Language Processing | Unsupervised Part-of-speech |

claim paper

» Unsupervised Lexical Acquisition for Part of Speech Tagging

» Automatic Refinement of a POS Tagger Using a Reliable Parser and Plain Text Corpora

» A Simple Unsupervised Learner for POS Disambiguation Rules Given Only a Minimal Lexicon

» Climbing the Tower of Babel Unsupervised Multilingual Learning

» A Hybrid Model for PartofSpeech Tagging and its Application to Bengali

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	EMNLP
Authors	Benjamin Snyder, Tahira Naseem, Jacob Eisenstein, Regina Barzilay

Comments (0)

Sciweavers

Unsupervised Multilingual Learning for POS Tagging

EMNLP 2008 | Hierarchical Bayesian Model | Multilingual Learning | Natural Language Processing | Unsupervised Part-of-speech |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers