Intelligent Parsing of Scanned Volumes for Web Based Archives

16 years 25 days ago

Download clgiles.ist.psu.edu

The proliferation of digital libraries and the large amount of existing documents raise important issues in efﬁcient handling of documents. Printed texts in documents need to be converted into digital format and semantic information need to be parsed and managed for effective retrieval. In this work, we attempt to solve the problems faced by current web based archives, where large scale repositories of electronic resources have been built from scanned volumes. Speciﬁcally, we focus on the scientiﬁc domain and target scanned volumes of scientiﬁc publications. Our goal is to automate the semantic processing of scanned volumes, an important and challenging step towards efﬁcient retrieval of content within scanned volumes. We tackle the problem by designing a machine learning-based method to extract multi-level metadata about content of scanned volumes. We combine image and text information within scanned volumes for intelligent parsing. We developed a system and test it with re...

Xiaonan Lu, James Ze Wang, C. Lee Giles

Real-time Traffic

Large Scale Repositories | Semantic Computing | Semantic Information Need | SEMCO 2007 | Target Scanned Volumes |

claim paper

Added	04 Jun 2010
Updated	04 Jun 2010
Type	Conference
Year	2007
Where	SEMCO
Authors	Xiaonan Lu, James Ze Wang, C. Lee Giles

Sciweavers

Intelligent Parsing of Scanned Volumes for Web Based Archives

Large Scale Repositories | Semantic Computing | Semantic Information Need | SEMCO 2007 | Target Scanned Volumes |

Explore & Download

Productivity Tools

Sciweavers