In this paper we process and analyze web search engine query and click data from the perspective of the documents (URL’s) selected. We initially define possible document categor...
Incorporating background knowledge into data mining algorithms is an important but challenging problem. Current approaches in semi-supervised learning require explicit knowledge p...
Samah Jamal Fodeh, William F. Punch, Pang-Ning Tan
A methodology for automatically identifying and clustering semantic features or topics in a heterogeneous text collection is presented. Textual data is encoded using a low rank no...
Farial Shahnaz, Michael W. Berry, V. Paul Pauca, R...
We study dimensionality reduction or feature selection in text document categorization problem. We focus on the first step in building text categorization systems, that is the cho...
Random Indexing K-tree is the combination of two algorithms suited for large scale document clustering. Keywords Random Indexing, K-tree, Dimensionality Reduction, B-tree, Search T...
Christopher M. De Vries, Lance De Vine, Shlomo Gev...