In this article, we introduce a new problem: the construction of multi-structured documents. We first offer an overview of existing solutions to the representation of such docum...
A large annotated corpus is critical to the development of robust optical character recognizers (OCRs). However, creation of annotated corpora is a tedious task. It is laborious, ...
This paper concerns the document multi-structuring issue. For various use objectives, many distinct structures may be defined simultaneously for the same original document. For ex...
Noureddine Chatti, Sylvie Calabretto, Jean-Marie P...
Structured documents, especially the XML documents, are made up of a few logical components, such as title, sections, subsections and paragraphs. The components in each structured...
Processing a twig pattern query in XML document includes structural search and content search. Most existing algorithms only focus on structural search. They treat content nodes th...