Sciweavers

502 search results - page 19 / 101
» Extracting Partial Structures from HTML Documents
Sort
View
PVLDB
2010
114views more  PVLDB 2010»
13 years 6 months ago
ObjectRunner: Lightweight, Targeted Extraction and Querying of Structured Web Data
We present in this paper ObjectRunner, a system for extracting, integrating and querying structured data from the Web. Our system harvests real-world items from template-based HTM...
Talel Abdessalem, Bogdan Cautis, Nora Derouiche
ICDE
2010
IEEE
251views Database» more  ICDE 2010»
14 years 8 months ago
Viewing a World of Annotations through AnnoVIP
The proliferation of electronic content has notably lead to the apparition of large corpora of interrelated structured documents (such as HTML and XML Web pages) and semantic annot...
Konstantinos Karanasos, Spyros Zoupanos
IJDAR
2008
136views more  IJDAR 2008»
13 years 8 months ago
Matching word images for content-based retrieval from printed document images
As large quantity of document images is getting archived by the digital libraries, there is a need for an efficient search strategies to make them available as per users informatio...
Million Meshesha, C. V. Jawahar
SIGIR
2009
ACM
14 years 3 months ago
Extracting structured information from user queries with semi-supervised conditional random fields
When search is against structured documents, it is beneficial to extract information from user queries in a format that is consistent with the backend data structure. As one step...
Xiao Li, Ye-Yi Wang, Alex Acero
CIKM
2009
Springer
14 years 3 months ago
The impact of document structure on keyphrase extraction
Keyphrases are short phrases that reflect the main topic of a document. Because manually annotating documents with keyphrases is a time-consuming process, several automatic appro...
Katja Hofmann, Manos Tsagkias, Edgar Meij, Maarten...