This paper introduces Saxon, a rule based document annotator that is capable of processing and annotating several document formats and media, both within and across documents. Fur...
Converting a scanned document to a binary format (black and white) is a key step in the digitization process. While many existing binarization algorithms operate robustly for well...
—This paper presents a framework to restore the 2D content printed on documents in the presence of geometric distortion and nonuniform illumination. Compared with text-based docu...
Michael S. Brown, Mingxuan Sun, Ruigang Yang, Lin ...
In automated text categorization, given a small number of labeled documents, it is very challenging, if not impossible, to build a reliable classifier that is able to achieve high...
Zenglin Xu, Rong Jin, Kaizhu Huang, Michael R. Lyu...
The re-use of spoken word audio collections maintained by audiovisual archives is severely hindered by their generally limited access. The CHoral project, which is part of the CAT...
Willemijn Heeren, Franciska de Jong, Laurens van d...