Sciweavers

216 search results - page 12 / 44
» Classifying Documents Without Labels
Sort
View
ICAC
2005
IEEE
14 years 3 months ago
PICCIL: Interactive Learning to Support Log File Categorization
Motivated by the real-world application of categorizing system log messages into defined situation categories, this paper describes an interactive text categorization method, PICC...
David Loewenstern, Sheng Ma, Abdi Salahshour
ICDAR
2003
IEEE
14 years 3 months ago
Automatic Feature Selection with Applications to Script Identification of Degraded Documents
Current approaches to script identification rely on hand-selected features and often require processing a significant part of the document to achieve reliable identification. We p...
Vitaly Ablavsky, Mark R. Stevens
WEBI
2004
Springer
14 years 3 months ago
Co-training with a Single Natural Feature Set Applied to Email Classification
When dealing with information overload from the Internet, such as the classification of Web pages and the filtering of email spam, a new technique called cotraining has been shown...
Jason Chan, Irena Koprinska, Josiah Poon
ADCS
2004
13 years 11 months ago
Co-Training on Textual Documents with a Single Natural Feature Set
Co-training is a semi-supervised technique that allows classifiers to learn with fewer labelled documents by taking advantage of the more abundant unclassified documents. However, ...
Jason Chan, Irena Koprinska, Josiah Poon
ICPR
2006
IEEE
14 years 11 months ago
Pixel-Accurate Representation and Evaluation of Page Segmentation in Document Images
This paper presents a new representation and evaluation procedure of page segmentation algorithms and analyzes six widely-used layout analysis algorithms using the procedure. The ...
Daniel Keysers, Faisal Shafait, Thomas M. Breuel