Sciweavers

602 search results - page 42 / 121
» Integrating Data and Probabilistically Structured Text Docum...
Sort
View
ICDE
2007
IEEE
128views Database» more  ICDE 2007»
16 years 5 months ago
SQL Queries Over Unstructured Text Databases
Text documents often embed data that is structured in nature. By processing a text database with information extraction systems, we can define a variety of structured "relati...
Alpa Jain, AnHai Doan, Luis Gravano
127
Voted
ISEMANTICS
2010
15 years 5 months ago
STEX+: a system for flexible formalization of linked data
We present the STEX system, a semantic extension of LATEX, that allows for producing high-quality PDF documents for (proof)reading and printing, as well as semantic XML/OMDoc docu...
Andrea Kohlhase, Michael Kohlhase, Christoph Lange...
133
Voted
DOCENG
2009
ACM
15 years 10 months ago
Object-level document analysis of PDF files
The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...
Tamir Hassan
146
Voted
IJMMS
2008
108views more  IJMMS 2008»
15 years 3 months ago
Ontology-based information extraction and integration from heterogeneous data sources
In this paper we present the design, implementation and evaluation of SOBA, a system for ontology-based information extraction from heterogeneous data resources, including plain t...
Paul Buitelaar, Philipp Cimiano, Anette Frank, Mat...
150
Voted
ICDE
2010
IEEE
288views Database» more  ICDE 2010»
16 years 3 months ago
Fast In-Memory XPath Search using Compressed Indexes
A large fraction of an XML document typically consists of text data. The XPath query language allows text search via the equal, contains, and starts-with predicates. Such predicate...
Diego Arroyuelo, Francisco Claude, Sebastian Manet...