In this research, a systematic study is conducted of four dimension reduction techniques for the text clustering problem, using five benchmark data sets. Of the four methods -- Ind...
Bin Tang, Michael A. Shepherd, Malcolm I. Heywood,...
Low-dimensional topic models have been proven very useful for modeling a large corpus of documents that share a relatively small number of topics. Dimensionality reduction tools s...
High volume print jobs are getting more common due to the growing demand for personalized documents. In this context, Variable Data Printing (VDP) has become a useful tool for mar...
We present a probabilistic model for a document corpus that combines many of the desirable features of previous models. The model is called “GaP” for Gamma-Poisson, the distri...
In distributed data mining models, adopting a flat node distribution model can affect scalability. To address the problem of modularity, flexibility and scalability, we propose...