A Generic Architecture for the Conversion of Document Collections into Semantically Annotated Digital Archives

15 years 6 months ago

Download www.jucs.org

: Mass digitization of document collections with further processing and semantic annotation is an increasing activity among libraries and archives at large for preservation, browsing and navigation, and search purposes. In this paper we propose a software architecture for the process of converting high volumes of document collections to semantically annotated digital libraries. The proposed architecture recognizes two sources of knowledge in the conversion pipeline, namely document images and humans. The Image Analysis module and the Correction and Validation module cover the initial conversion stages. In the former information is automatically extracted from document images. The latter involves human intervention at a technical level to define workflows and to validate the image processing results. The second stage, represented by the Knowledge Capture modules requires information specific to the particular knowledge domain and generally calls for expert practitioners. These two princ...

Josep Lladós, Dimosthenis Karatzas, Joan Ma

Real-time Traffic

Document | Document Collections | Document Image | JUCS 2008 |

claim paper

» A NoCompromises Architecture for Digital Document Preservation

» A Semantic Web Powered Distributed Digital Library System

» Multichannel publishing of interactive multimedia presentations

Post Info
More Details (n/a)

Added	13 Dec 2010
Updated	13 Dec 2010
Type	Journal
Year	2008
Where	JUCS
Authors	Josep Lladós, Dimosthenis Karatzas, Joan Mas, Gemma Sánchez

Comments (0)

Sciweavers

A Generic Architecture for the Conversion of Document Collections into Semantically Annotated Digital Archives

Document | Document Collections | Document Image | JUCS 2008 |

Explore & Download

Productivity Tools

Sciweavers