Sciweavers

472 search results - page 56 / 95
» Crawling the Hidden Web
Sort
View
WSDM
2010
ACM
204views Data Mining» more  WSDM 2010»
14 years 2 months ago
Learning URL patterns for webpage de-duplication
Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...
ICDE
2008
IEEE
153views Database» more  ICDE 2008»
14 years 9 months ago
Automatically Extracting Form Labels
We describe a machine-learning-based approach for extracting attribute labels from Web form interfaces. Having these labels is a requirement for several techniques that attempt to ...
Hoa Nguyen, Eun Yong Kang, Juliana Freire
WWW
2009
ACM
14 years 8 months ago
Triplify: light-weight linked data publication from relational databases
In this paper we present Triplify ? a simplistic but effective approach to publish Linked Data from relational databases. Triplify is based on mapping HTTP-URI requests onto relat...
Sören Auer, Sebastian Dietzold, Jens Lehmann,...
SAC
2005
ACM
14 years 1 months ago
Pollock: automatic generation of virtual web services from web sites
As the usage of Web Services proliferates dramatically, new tools to help quickly generate web services are needed. In this paper, we propose a methodology that helps to automatic...
Yi-Hsuan Lu, Yoojin Hong, Jinesh Varia, Dongwon Le...
IJWIS
2007
77views more  IJWIS 2007»
13 years 7 months ago
World's first web census
: Purpose — To measure the exact size of the World Wide Web (i.e., a census). The measure used is the number of publicly accessible web servers on port 80. Design/methodology/app...
Darcy G. Benoit, André Trudel