Document classification presents difficult challenges due to the sparsity and the high dimensionality of text data, and to the complex semantics of the natural language. The tradi...
Abstract. The number of features to be considered in a text classification system is given by the size of the vocabulary and this is normally in the range of the tens or hundreds o...
David Vilar, Hermann Ney, Alfons Juan, Enrique Vid...
Most research in text classification to date has used a “bag of words” representation in which each feature corresponds to a single word. This paper examines some alternative ...
In this paper, we report on a study that was performed within the "Semantics of History" project on how descriptions of historical events are realized in different types...
: This work presents an unsupervised solution to language identification. The method sorts multilingual text corpora on the basis of sentences into the different languages that are...