Word Sense Disambiguation by Web Mining for Word Co-occurrence Probabilities

14 years 2 months ago

Download cogprints.org

This paper describes the National Research Council (NRC) Word Sense Disambiguation (WSD) system, as applied to the English Lexical Sample (ELS) task in Senseval-3. The NRC system approaches WSD as a classical supervised machine learning problem, using familiar tools such as the Weka machine learning software and Brill's rule-based part-of-speech tagger. Head words are represented as feature vectors with several hundred features. Approximately half of the features are syntactic and the other half are semantic. The main novelty in the system is the method for generating the semantic features, based on word co-occurrence probabilities. The probabilities are estimated using the Waterloo MultiText System with a corpus of about one terabyte of unlabeled text, collected by a web crawler.

Peter D. Turney

Real-time Traffic

CORR 2004 | Education | English Lexical Sample | Rule-based Part-of-speech Tagger | Word Sense Disambiguation |

claim paper

Post Info
More Details (n/a)

Added	17 Dec 2010
Updated	17 Dec 2010
Type	Journal
Year	2004
Where	CORR
Authors	Peter D. Turney

Comments (0)

Sciweavers

Word Sense Disambiguation by Web Mining for Word Co-occurrence Probabilities

CORR 2004 | Education | English Lexical Sample | Rule-based Part-of-speech Tagger | Word Sense Disambiguation |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers