Sciweavers

1947 search results - page 6 / 390
» On the Automatic Extraction of Data from the Hidden Web
Sort
View
SIGIR
2004
ACM
14 years 2 months ago
Query-related data extraction of hidden web documents
The larger amount of information on the Web is stored in document databases and is not indexed by general-purpose search engines (i.e., Google and Yahoo). Such information is dyna...
Yih-Ling Hedley, Muhammad Younas, Anne E. James, M...
DEBU
2000
95views more  DEBU 2000»
13 years 8 months ago
Accurately and Reliably Extracting Data from the Web: A Machine Learning Approach
A critical problem in developing information agents for the Web is accessing data that is formatted for human use. We have developed a set of tools for extracting data from web si...
Craig A. Knoblock, Kristina Lerman, Steven Minton,...
CIKM
2003
Springer
14 years 1 months ago
Extracting unstructured data from template generated web documents
We propose a novel approach that identifies web page templates and extracts the unstructured data. Extracting only the body of the page and eliminating the template increases the ...
Ling Ma, Nazli Goharian, Abdur Chowdhury, Misun Ch...
SEMCO
2009
IEEE
14 years 3 months ago
An Algebraic Language for Semantic Data Integration on the Hidden Web
Semantic integration in the hidden Web is an emerging area of research where traditional assumptions do not always hold. Frequent changes, conflicts and the sheer size of the hid...
Shazzad Hosain, Hasan M. Jamil
PAKM
2004
13 years 10 months ago
Automatic Generation of Taxonomies from the WWW
In this paper we present a methodology to extract information from the Web to build a taxonomy of terms and Web resources for a given domain. This taxonomy represents a hierarchy o...
David Sánchez, Antonio Moreno