Sciweavers

2677 search results - page 22 / 536
» Extracting Structured Data from Web Pages
Sort
View
EMNLP
2008
13 years 10 months ago
Improved Sentence Alignment on Parallel Web Pages Using a Stochastic Tree Alignment Model
Parallel web pages are important source of training data for statistical machine translation. In this paper, we present a new approach to sentence alignment on parallel web pages....
Lei Shi, Ming Zhou
CAISE
2003
Springer
14 years 1 months ago
Extending an on-line information site with accurate domain-dependent extracts from the World Wide Web
This paper describes a new procedure that has been developed for extending an existing on-line information system about The Voyages of the Beagle with information collected automat...
Enrique Alfonseca, Pilar Rodríguez
SIGMOD
2010
ACM
232views Database» more  SIGMOD 2010»
13 years 8 months ago
Optimizing content freshness of relations extracted from the web using keyword search
An increasing number of applications operate on data obtained from the Web. These applications typically maintain local copies of the web data to avoid network latency in data acc...
Mohan Yang, Haixun Wang, Lipyeow Lim, Min Wang
VLDB
2011
ACM
251views Database» more  VLDB 2011»
13 years 3 months ago
Harvesting relational tables from lists on the web
A large number of web pages contain data structured in the form of “lists”. Many such lists can be further split into multi-column tables, which can then be used in more seman...
Hazem Elmeleegy, Jayant Madhavan, Alon Y. Halevy
CIKM
1998
Springer
14 years 25 days ago
Ontology-Based Extraction and Structuring of Information from Data-Rich Unstructured Documents
We present a new approach to extracting information from unstructured documents based on an application ontology that describes a domain of interest. Starting with such an ontolog...
David W. Embley, Douglas M. Campbell, Randy D. Smi...