Although detecting text lines in machine printed documents is typically considered a solved problem, it is still a challenge to segment handwritten text lines in the general sense...
Topic models provide a powerful tool for analyzing large text collections by representing high dimensional data in a low dimensional subspace. Fitting a topic model given a set of...
We present an approach of how to automatically extract an XML document structure from a conceptual data model that describes the content of the document. We use UML class diagrams ...
Whole-book recognition is a document image analysis strategy that operates on the complete set of a book’s page images, attempting to improve accuracy by automatic unsupervised ...
We present a new method for classification with structured
latent variables. Our model is formulated using the
max-margin formalism in the discriminative learning literature.
We...