This paper explores topic aspect (i.e., subtopic or facet) classification for English and Chinese collections. The evaluation model assumes a bilingual user who has found document...
Classification of documents by genre is typically done either using linguistic analysis or term frequency based techniques. The former provides better classification accuracy than...
We describe an efficient technique to weigh word-based features in binary classification tasks and show that it significantly improves classification accuracy on a range of proble...
Justin Martineau, Tim Finin, Anupam Joshi, Shamit ...
We demonstrate a phonotactic-semantic paradigm for spoken document categorization. In this framework, we define a set of acoustic words instead of lexical words to represent acous...
Image classification is a well-studied and hard problem in computer vision. We extend a proven solution for classifying web spam to handle images. We exploit the link structure of...