Incorporating background knowledge into data mining algorithms is an important but challenging problem. Current approaches in semi-supervised learning require explicit knowledge provided by domain experts, knowledge specific to the particular data set. In this study, we propose an ensemble model that couples two sources of information: statistics information that is derived from the data set, and sense information retrieved from WordNet that is used to build a semantic binary model. We evaluated the efficacy of using our combined ensemble model on the Reuters-21578 and 20newsgroups data sets. Keywords WordNet, ensemble learning, text clustering, disambiguation.
Samah Jamal Fodeh, William F. Punch, Pang-Ning Tan