This paper describes a new versatile algorithm for correcting nonlinear distortions, such as curvature of book pages, in camera based document processing. We introduce the idea of...
This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Titles of HTML documents should be correctly defined in the title fields; however, i...
Traditionally, research in identifying structured entities in documents has proceeded independently of document categorization research. In this paper, we observe that these two t...
Semantic web researchers tend to assume that XML Schema and OWL-S are the correct means for representing the types, structure, and semantics of XML data used for documents and int...
Andruid Kerne, Zachary O. Toups, Blake Dworaczyk, ...
Automatic classification of documents is an important area of research with many applications in the fields of document searching, forensics and others. Methods to perform classif...