Sciweavers

41 search results - page 6 / 9
» Google's Deep Web crawl
Sort
View
PVLDB
2008
141views more  PVLDB 2008»
13 years 7 months ago
WebTables: exploring the power of tables on the web
The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from...
Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wa...
FC
2010
Springer
160views Cryptology» more  FC 2010»
13 years 11 months ago
Measuring the Perpetrators and Funders of Typosquatting
We describe a method for identifying “typosquatting”, the intentional registration of misspellings of popular website addresses. We estimate that at least 938 000 typosquatting...
Tyler Moore, Benjamin Edelman
LAWEB
2003
IEEE
14 years 22 days ago
Finding Buying Guides with a Web Carnivore
Research on buying behavior indicates that buying guides perform an important role in the overall buying process. However, while many buying guides can be found on the Web, findin...
Reiner Kraft, Raymie Stata
ICMCS
2009
IEEE
131views Multimedia» more  ICMCS 2009»
13 years 5 months ago
Web image mining using concept sensitive Markov stationary features
With the explosive growth of web resources, how to mine semantically relevant images efficiently becomes a challenging and necessary task. In this paper, we propose a concept sens...
Chunjie Zhang, Jing Liu, Hanqing Lu, Songde Ma
JIS
2008
119views more  JIS 2008»
13 years 7 months ago
A three-year study on the freshness of web search engine databases
This paper deals with one aspect of the index quality of search engines: index freshness. The purpose is to analyse the update strategies of the major Web search engines Google, Y...
Dirk Lewandowski