We use a reliably annotated corpus to compare metrics of coherence based on Centering Theory with respect to their potential usefulness for text structuring in natural language ge...
Nikiforos Karamanis, Massimo Poesio, Chris Mellish...
—This paper presents a novel block-based fast compression (BFC) algorithm for compound images that contain graphics, text and natural images. The images are divided to blocks, wh...
For people who use text-based web browsers, graphs, diagrams, and pictures are inaccessible. Yet, such diagrams are quite prominent in documents commonly found on the web. In this...
Kathleen F. McCoy, Sandra Carberry, Tom Roper, Nan...
In this paper we propose a multimedia categorization framework that is able to exploit information across different parts of a multimedia document (e.g., a Web page, a PDF, a Micr...
—The goal of this work is to add the capability to segment documents containing text, graphics, and pictures in the open source OCR engine OCRopus. To achieve this goal, OCRopusâ...
Amy Winder, Tim L. Andersen, Elisa H. Barney Smith