The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from...
Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wa...
We describe a method for identifying “typosquatting”, the intentional registration of misspellings of popular website addresses. We estimate that at least 938 000 typosquatting...
Research on buying behavior indicates that buying guides perform an important role in the overall buying process. However, while many buying guides can be found on the Web, findin...
With the explosive growth of web resources, how to mine semantically relevant images efficiently becomes a challenging and necessary task. In this paper, we propose a concept sens...
This paper deals with one aspect of the index quality of search engines: index freshness. The purpose is to analyse the update strategies of the major Web search engines Google, Y...