Charts are common graphic representation for scientific data in technical and business papers. We present a robust system for detecting and recognizing bar charts. The system incl...
It is crucial in many information systems to organize short text segments, such as keywords in documents and queries from users, into a well-formed topic hierarchy. In this paper,...
We consider the problem of document indexing and representation. Recently, Locality Preserving Indexing (LPI) was proposed for learning a compact document subspace. Different from...
Deng Cai, Xiaofei He, Wei Vivian Zhang, Jiawei Han
We propose a new algorithm for dimensionality reduction and unsupervised text classification. We use mixture models as underlying process of generating corpus and utilize a novel,...
In recent years several models have been proposed for text categorization. Within this, one of the widely applied models is the vector space model (VSM), where independence betwee...