This paper describes our efforts to develop a toolset and process for automated metadata extraction from large, diverse, and evolving document collections. A number of federal agen...
Paul Flynn, Li Zhou, Kurt Maly, Steven J. Zeil, Mo...
Abstract. A paper document processing system is an information system component which transforms information on printed or handwritten documents into a computer-revisable form. In ...
Floriana Esposito, Donato Malerba, Francesca A. Li...
Standard algorithms for template-based information extraction (IE) require predefined template schemas, and often labeled data, to learn to extract their slot fillers (e.g., an ...
The graph-based ranking algorithm has been recently exploited for multi-document summarization by making only use of the sentence-to-sentence relationships in the documents, under...
The two observations that 1) many XML documents are stored in a database or generated from data stored in a database and 2) processing these documents with XSL stylesheet processo...