A methodology for automatically identifying and clustering semantic features or topics in a heterogeneous text collection is presented. Textual data is encoded using a low rank no...
Farial Shahnaz, Michael W. Berry, V. Paul Pauca, R...
We study dimensionality reduction or feature selection in text document categorization problem. We focus on the first step in building text categorization systems, that is the cho...
Random Indexing K-tree is the combination of two algorithms suited for large scale document clustering. Keywords Random Indexing, K-tree, Dimensionality Reduction, B-tree, Search T...
Christopher M. De Vries, Lance De Vine, Shlomo Gev...
Extractive multi-document summarization is the task of choosing sentences from a set of documents to compose a summary text in response to a user query. We propose a generative ap...