We report an improved methodology for training classifiers for document image content extraction, that is, the location and segmentation of regions containing handwriting, machine...
Background: Document classification is a wide-spread problem with many applications, from organizing search engine snippets to spam filtering. We previously described Textpresso, ...
In this paper, we present a compressed pattern matching method for searching user queried words in the CCITT Group 4 compressed document images, without decompressing. The feature...
There are obvious ways in which text and diagrams within a document should be coordinated: for instance, the placement of a diagram might influence the wording of the text. However...
We consider the problem of dealing with irrelevant votes when a multi-case classifier is built from an ensemble of binary classifiers. We show how run-off elections can be used to...