Sciweavers

563 search results - page 36 / 113
» Crawling the web for structured documents
Sort
View
WEBDB
1999
Springer
196views Database» more  WEBDB 1999»
15 years 8 months ago
Web Ecology: Recycling HTML Pages as XML Documents Using W4F
In this paper we present the World-Wide Web Wrapper Factory (W4F), a Java toolkit to generate wrappers for Web data sources. Some key features of W4F are an expressive language to...
Arnaud Sahuguet, Fabien Azavant
WWW
2005
ACM
16 years 4 months ago
Processing link structures and linkbases on the web
Hyperlinks are an essential feature of the World Wide Web, highly responsible for its success. XLink improves on HTML's linking capabilities in several ways. In particular, l...
François Bry, Michael Eckert
125
Voted
WWW
2007
ACM
16 years 4 months ago
Integrating web directories by learning their structures
Documents in the Web are often organized using category trees by information providers (e.g. CNN, BBC) or search engines (e.g. Google, Yahoo!). Such category trees are commonly kn...
Christopher C. Yang, Jianfeng Lin
CIKM
2011
Springer
14 years 4 months ago
Integrating and querying web databases and documents
There exist many interrelated information sources on the Internet that can be categorized into structured (database) and semistructured (documents). A key challenge is to integrat...
Carlos Garcia-Alvarado, Carlos Ordonez
ISEC
2001
Springer
180views ECommerce» more  ISEC 2001»
15 years 8 months ago
i-Cube: A Tool-Set for the Dynamic Extraction and Integration of Web Data Content
Over the past decade the Internet has evolved into the largest public community in the world. It provides a wealth of data content and services in almost every field of science, t...
Frankie Poon, Kostas Kontogiannis