Text Separation from Mixed Documents Using a Tree-Structured Classifier

14 years 5 months ago

Download www.visionopen.com

In this paper, we propose a tree-structured multiclass classifier to identify annotations and overlapping text from machine printed documents. Each node of the tree-structured classifier is a binary weak learner. Unlike normal decision tree(DT) which only considers a subset of training data at each node and is susceptible to over-fitting, we boost the tree using all training data at each node with different weights. The evaluation of the proposed method is presented on a set of machineprinted documents which have been annotated by multiple writers in an office/collaborative environment.

Xujun Peng, Srirangaraj Setlur, Venu Govindaraju,

Real-time Traffic

Binary Weak Learner | Computer Vision | ICPR 2010 | Training Data | Tree-structured Multiclass Classifier |

claim paper

» On Separation of English Numerals from Multilingual Document Images

» Image Classification to Improve Printing Quality of MixedType Documents

» Learning to Separate Text Content and Style for Classification

» Semantic enrichment of text representation with wikipedia for text classification

» Distinguishing Mathematics Notation from English Text using Computational Geometry

» Classifying Documents Without Labels

» Localization of Digit Strings in FarsiArabic Document Images Using Structural Features and...

» Iterated Document Content Classification

Post Info
More Details (n/a)

Added	13 Feb 2011
Updated	13 Feb 2011
Type	Journal
Year	2010
Where	ICPR
Authors	Xujun Peng, Srirangaraj Setlur, Venu Govindaraju, Ramachandrula Sitaram

Comments (0)

Sciweavers

Text Separation from Mixed Documents Using a Tree-Structured Classifier

Binary Weak Learner | Computer Vision | ICPR 2010 | Training Data | Tree-structured Multiclass Classifier |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers