Automatic metadata generation provides scalability and usability for digital libraries and their collections. Machine learning methods offer robust and adaptable automatic metadat...
Hui Han, C. Lee Giles, Eren Manavoglu, Hongyuan Zh...
Abstract. The SPIRIT search engine provides a test bed for the development of web search technology that is specialised for access to geographical information. Major components inc...
Christopher B. Jones, Alia I. Abdelmoty, David Fin...
This paper describes our efforts to develop a toolset and process for automated metadata extraction from large, diverse, and evolving document collections. A number of federal agen...
Paul Flynn, Li Zhou, Kurt Maly, Steven J. Zeil, Mo...
An oracle is described for dynamic validation of an application (metadata extraction from scanned documents) where a moderate failure rate is acceptable provided that instances of...
Kurt Maly, Steven J. Zeil, Mohammad Zubair, Ashraf...
In this paper we analyze our recent research on the use of document analysis techniques for metadata extraction from PDF papers. We describe a package that is designed to extract ...