Sciweavers

1947 search results - page 13 / 390
» On the Automatic Extraction of Data from the Hidden Web
Sort
View
WWW
2005
ACM
14 years 9 months ago
Web data extraction based on partial tree alignment
This paper studies the problem of extracting data from a Web page that contains several structured data records. The objective is to segment these data records, extract data items...
Yanhong Zhai, Bing Liu
PVLDB
2010
114views more  PVLDB 2010»
13 years 7 months ago
ObjectRunner: Lightweight, Targeted Extraction and Querying of Structured Web Data
We present in this paper ObjectRunner, a system for extracting, integrating and querying structured data from the Web. Our system harvests real-world items from template-based HTM...
Talel Abdessalem, Bogdan Cautis, Nora Derouiche
INLG
2010
Springer
13 years 6 months ago
Extracting Parallel Fragments from Comparable Corpora for Data-to-text Generation
Building NLG systems, in particular statistical ones, requires parallel data (paired inputs and outputs) which do not generally occur naturally. In this paper, we investigate the ...
Anja Belz, Eric Kow
SIGIR
2005
ACM
14 years 2 months ago
Title extraction from bodies of HTML documents and its application to web page retrieval
This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Titles of HTML documents should be correctly defined in the title fields; however, i...
Yunhua Hu, Guomao Xin, Ruihua Song, Guoping Hu, Sh...
WEBDB
2010
Springer
156views Database» more  WEBDB 2010»
14 years 1 months ago
Redundancy-Driven Web Data Extraction and Integration
A large number of web sites publish pages containing structured information about recognizable concepts, but these data are only partially used by current applications. Although s...
Paolo Papotti, Valter Crescenzi, Paolo Merialdo, M...