In this paper, word sense dismnbiguation (WSD) accuracy achievable by a probabilistic classifier, using very milfimal training sets, is investigated. \Ve made the assuml)tiou that there are no tagged corpora available and identified what information, needed by an accurate WSD system, can and cmmot be automatically obtained. The lesson learned can then be used to locus on what knowledge needs malmal annotation. Our system, named Bayesian Hierarchical Disambiguator (BHD), uses the Internet, arguably tile largest corlms in existence, to address the st)arse data problem, and uses WordNet's hierarchy tbr semantic contextual features. In addition, Bayesian networks are automatically constructed to represent knowledge learned from training sets by lnodeling the selectional i)retbrence of adjectives. These networks are then applied to disaml)iguation by pertbrming inferences on unseen adjective-noun pairs. We demonstrate that this system is able to disambiguate adjectives in um'estr...
Gerald Chao, Michael G. Dyer