Supervised text classification is the task of automatically assigning a category label to a previously unlabeled text document. We start with a collection of pre-labeled examples ...
Abstract. Previous researches on advanced representations for document retrieval have shown that statistical state-of-the-art models are not improved by a variety of different ling...
Abstract. Current text classification systems typically use term stems for representing document content. Semantic Web technologies allow the usage of features on a higher semantic...
We present a study of new word identification (NWI) to improve the performance of a Chinese word segmenter. In this paper the distribution and types of new words are discussed emp...
Consider a supervised learning problem in which examples contain both numerical- and text-valued features. To use traditional featurevector-based learning methods, one could treat...
Sofus A. Macskassy, Haym Hirsh, Arunava Banerjee, ...