Many applications require analyzing vast amounts of textual data, but the size and inherent noise of such data can make processing very challenging. One approach to these issues i...
David G. Underhill, Luke McDowell, David J. Marche...
This paper explores the use of hierarchical structure for classifying a large, heterogeneous collection of web content. The hierarchical structure is initially used to train diffe...
In this paper we study the effectiveness of using a phrase-based representation in e-mail classification, and the affect this approach has on a number of machine learning algorithm...
This paper addresses the problem of categorizing terms or lexical entities into a predefined set of semantic domains exploiting the knowledge available on-line in the Web. The prop...
Leonardo Rigutini, Ernesto Di Iorio, Marco Ernande...
Classifying and mining noise-free web pages will improve on accuracy of search results as well as search speed, and may benefit webpage organization applications (e.g., keyword-bas...