Automated Building of OAI Compliant Repository from Legacy Collection

14 years 6 months ago

Download elpub.scix.net

In this paper, we report on our experience with the creation of an automated, human-assisted process to extract metadata from documents in a large (>100,000), dynamically growing collection. Such a collection may be expected to be heterogeneous, both statically heterogeneous (containing documents in a variety of formats) and dynamically heterogeneous (likely to acquire new documents in formats unlike any prior acquisitions). Eventually, we hope to be able to totally automate metadata extraction for 80% of the documents and reduce the time needed to generate the metadata for the remaining documents also by 80%. In this paper, we describe our process of first classifying documents into equivalence classes for which we can then use a rule-based approach to extract metadata. Our rule-based approach differs from others in as far as it separates the rule-interpreting engine from a template of rules. The templates vary among classes but the engine is the same. We have evaluated our approa...

Jianfeng Tang, Kurt Maly, Steven J. Zeil, Mohammad

Real-time Traffic

Documents | ELPUB 2006 | Information Management | Metadata | Rule-based Approach |

claim paper

Post Info
More Details (n/a)

Added	13 Jun 2010
Updated	13 Jun 2010
Type	Conference
Year	2006
Where	ELPUB
Authors	Jianfeng Tang, Kurt Maly, Steven J. Zeil, Mohammad Zubair

Comments (0)

Sciweavers

Automated Building of OAI Compliant Repository from Legacy Collection

Documents | ELPUB 2006 | Information Management | Metadata | Rule-based Approach |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers