A central problem in information retrieval is the automated classification of text documents. While many existing methods achieve good levels of performance, they generally require...
Abstract. The number of features to be considered in a text classification system is given by the size of the vocabulary and this is normally in the range of the tens or hundreds o...
David Vilar, Hermann Ney, Alfons Juan, Enrique Vid...
Classification algorithms and document representation approaches are two key elements for a successful document classification system. In the past, much work has been conducted to...
We propose a novel approach for categorizing text documents based on the use of a special kernel. The kernel is an inner product in the feature space generated by all subsequences...
Huma Lodhi, John Shawe-Taylor, Nello Cristianini, ...
Bayesian text classifiers face a common issue which is referred to as data sparsity problem, especially when the size of training data is very small. The frequently used Laplacian...