Similarity search has been proved suitable for searching in very large collections of unstructured data objects. We are interested in efficient parallel query processing under si...
Classification of texts potentially containing a complex and specific terminology requires the use of learning methods that do not rely on extensive feature engineering. In this w...
Text classification is the process of classifying documents into predefined categories based on their content. Existing supervised learning algorithms to automatically classify te...
In this research, a systematic study is conducted of four dimension reduction techniques for the text clustering problem, using five benchmark data sets. Of the four methods -- Ind...
Bin Tang, Michael A. Shepherd, Malcolm I. Heywood,...
In this paper, we define and study a novel text mining problem, which we refer to as Comparative Text Mining (CTM). Given a set of comparable text collections, the task of compara...