Sciweavers

315 search results - page 40 / 63
» Text classification from positive and unlabeled documents
Sort
View
WWW
2004
ACM
14 years 8 months ago
Using urls and table layout for web classification tasks
We propose new features and algorithms for automating Web-page classification tasks such as content recommendation and ad blocking. We show that the automated classification of We...
L. K. Shih, David R. Karger
TAL
2010
Springer
13 years 5 months ago
Summarization as Feature Selection for Document Categorization on Small Datasets
Abstract. Most common feature selection techniques for document categorization are supervised and require lots of training data in order to accurately capture the descriptive and d...
Emmanuel Anguiano-Hernández, Luis Villase&n...
ICDAR
2011
IEEE
12 years 7 months ago
A Handwritten Character Extraction Algorithm for Multi-language Document Image
—In this paper, we propose a novel method for extracting handwritten characters from multi-language document images, which may contain various types of characters, e.g. Chinese, ...
Yonghong Song, Guilin Xiao, Yuanlin Zhang, Lei Yan...
ICDAR
2011
IEEE
12 years 7 months ago
Language-Independent Text Lines Extraction Using Seam Carving
Abstract—In this paper, we present a novel languageindependent algorithm for extracting text-lines from handwritten document images. Our algorithm is based on the seam carving ap...
Raid Saabni, Jihad El-Sana
WWW
2006
ACM
14 years 8 months ago
Using symbolic objects to cluster web documents
Web Clustering is useful for several activities in the WWW, from automatically building web directories to improve retrieval performance. Nevertheless, due to the huge size of the...
Esteban Meneses, Oldemar Rodríguez-Rojas