Separation of Textual and Non-textual Information within Mixed-Mode Documents

15 years 7 months ago

Download b2.cvl.iis.u-tokyo.ac.jp

An increasing number of comfortable publishing systems nowadays leads to documents containing more than just textual information. Graphics and images are combined with text and often overlap one another. In this paper we present a robust algorithm for separating textual information from nontextual within multi-mode documents without recognizing individual characters. The approach generates connectcd componenLs and classifies them as text or non-text. As result, a credibility for each connected component is calculated which expresses its similarity to text or graphics. Moreover, strings are generated that represent sequences of connected components classified as text. Strings can be aligned in any direction. The main processing steps of oru system are connected component generation, neighborhood analysis, and the generation of strings.

Frank Hönes, Rainer Zimmer

Real-time Traffic

Comfortable Publishing Systems | Connected Component | MVA 1992 | MVA 2007 | Textual Information |

claim paper

Added	07 Nov 2010
Updated	07 Nov 2010
Type	Conference
Year	1992
Where	MVA
Authors	Frank Hönes, Rainer Zimmer

Sciweavers

Separation of Textual and Non-textual Information within Mixed-Mode Documents

Comfortable Publishing Systems | Connected Component | MVA 1992 | MVA 2007 | Textual Information |

Explore & Download

Productivity Tools

Sciweavers