This paper reports a document retrieval technique that retrieves machine-printed Latin-based document images through word shape coding. Adopting the idea of image annotation, a wo...
A central problem in information retrieval is the automated classification of text documents. While many existing methods achieve good levels of performance, they generally require...
This paper describes a tool for recombining the logical structure from an XML document with the typeset appearance of the corresponding PDF document. The tool uses the XML represe...
Matthew R. B. Hardy, David F. Brailsford, Peter L....
The performance of document analysis systems significantly depends on knowledge about the application domain that can be exploited in the analysis process. Typically, one has to d...
We consider the generic hypermedia structure of a document to be a means of representing the document that allows it to be processed into a wide variety of presentations. Represen...
Lloyd Rutledge, Jacco van Ossenbruggen, Lynda Hard...