Sciweavers

SAC
2009
ACM

Combining statistics and semantics via ensemble model for document clustering

14 years 7 months ago
Combining statistics and semantics via ensemble model for document clustering
Incorporating background knowledge into data mining algorithms is an important but challenging problem. Current approaches in semi-supervised learning require explicit knowledge provided by domain experts, knowledge specific to the particular data set. In this study, we propose an ensemble model that couples two sources of information: statistics information that is derived from the data set, and sense information retrieved from WordNet that is used to build a semantic binary model. We evaluated the efficacy of using our combined ensemble model on the Reuters-21578 and 20newsgroups data sets. Keywords WordNet, ensemble learning, text clustering, disambiguation.
Samah Jamal Fodeh, William F. Punch, Pang-Ning Tan
Added 19 May 2010
Updated 19 May 2010
Type Conference
Year 2009
Where SAC
Authors Samah Jamal Fodeh, William F. Punch, Pang-Ning Tan
Comments (0)