Abstract-- Text categorization is the task of assigning predefined categories to natural language text. With the widely used `bag of words' representation, previous researches...
Abstract. Automated Text Categorization has reached the levels of accuracy of human experts. Provided that enough training data is available, it is possible to learn accurate autom...
In this paper we explore the potential of concept indexing with WordNet synsets for Text Categorization, in comparison with the traditional bag of words text representation model. ...
This paper presents a language identification technique that detects Latin-based languages of imaged documents without OCR. The proposed technique detects languages through the wo...
We investigate the problem of learning document classifiers in a multilingual setting, from collections where labels are only partially available. We address this problem in the ...