We consider classification of email messages as to whether or not they contain certain “email acts”, such as a request or a commitment. We show that exploiting the sequential ...
The cluster assumption is exploited by most semi-supervised learning (SSL) methods. However, if the unlabeled data is merely weakly related to the target classes, it becomes quest...
We report an improved methodology for training classifiers for document image content extraction, that is, the location and segmentation of regions containing handwriting, machine...
We present a new machine learning framework called "self-taught learning" for using unlabeled data in supervised classification tasks. We do not assume that the unlabele...
Rajat Raina, Alexis Battle, Honglak Lee, Benjamin ...
As the web expands exponentially, the need to put some order to its content becomes apparent. Hypertext categorization, that is the automatic classification of web documents into ...