Text Categorization for Improved Priors of Word Meaning

16 years 28 days ago

Download www.dianamccarthy.co.uk

Distributions of the senses of words are often highly skewed. This fact is exploited by word sense disambiguation (WSD) systems which back oﬀ to the predominant (most frequent) sense of a word when contextual clues are not strong enough. The topic domain of a document has a strong inﬂuence on the sense distribution of words. Unfortunately, it is not feasible to produce large manually sense-annotated corpora for every domain of interest. Previous experiments have shown that unsupervised estimation of the predominant sense of certain words using corpora whose domain has been determined by hand outperforms estimates based on domain-independent text for a subset of words and even outperforms the estimates based on counting occurrences in an annotated corpus. In this paper we address the question of whether we can automatically produce domain-speciﬁc corpora which could be used to acquire predominant senses appropriate for speciﬁc domains. We collect the corpora by automatically cla...

Rob Koeling, Diana McCarthy, John Carroll

Real-time Traffic