Sciweavers

52 search results - page 7 / 11
» Bringing taxonomic structure to large digital libraries
Sort
View
IJCAI
2003
13 years 8 months ago
Integrating Information to Bootstrap Information Extraction from Web Sites
In this paper we propose a methodology to learn to extract domain-specific information from large repositories (e.g. the Web) with minimum user intervention. Learning is seeded b...
Fabio Ciravegna, Alexiei Dingli, David Guthrie, Yo...
ERCIMDL
2007
Springer
115views Education» more  ERCIMDL 2007»
14 years 1 months ago
The Semantic GrowBag Algorithm: Automatically Deriving Categorization Systems
Using keyword search to find relevant objects in digital libraries often results in way too large result sets. Based on the metadata associated with such objects, the faceted sear...
Jörg Diederich, Wolf-Tilo Balke
AND
2010
13 years 5 months ago
Document: a useful level for facing noisy data
In this paper we will present a set of experiments using large digitalized collections of books to show that logical structures can be extracted with good quality when working at ...
Hervé Déjean, Jean-Luc Meunier
ICDAR
2011
IEEE
12 years 7 months ago
Aletheia - An Advanced Document Layout and Text Ground-Truthing System for Production Environments
- Large-scale digitisation has led to a number of new possibilities with regard to adaptive and learning based methods in the field of Document Image Analysis and OCR. For ground t...
C. Clausner, Stefan Pletschacher, Apostolos Antona...
CIKM
2009
Springer
14 years 1 months ago
The impact of document structure on keyphrase extraction
Keyphrases are short phrases that reflect the main topic of a document. Because manually annotating documents with keyphrases is a time-consuming process, several automatic appro...
Katja Hofmann, Manos Tsagkias, Edgar Meij, Maarten...