Sciweavers

COLING
2010

A Multi-Domain Web-Based Algorithm for POS Tagging of Unknown Words

13 years 7 months ago
A Multi-Domain Web-Based Algorithm for POS Tagging of Unknown Words
We present a web-based algorithm for the task of POS tagging of unknown words (words appearing only a small number of times in the training data of a supervised POS tagger). When a sentence s containing an unknown word u is to be tagged by a trained POS tagger, our algorithm collects from the web contexts that are partially similar to the context of u in s, which are then used to compute new tag assignment probabilities for u. Our algorithm enables fast multi-domain unknown word tagging, since, unlike previous work, it does not require a corpus from the new domain. We integrate our algorithm into the MXPOST POS tagger (Ratnaparkhi, 1996) and experiment with three languages (English, German and Chinese) in seven in-domain and domain adaptation scenarios. Our algorithm provides an error reduction of up to 15.63% (English), 18.09% (German) and 13.57% (Chinese) over the original tagger.
Shulamit Umansky-Pesin, Roi Reichart, Ari Rappopor
Added 13 May 2011
Updated 13 May 2011
Type Journal
Year 2010
Where COLING
Authors Shulamit Umansky-Pesin, Roi Reichart, Ari Rappoport
Comments (0)