Sciweavers

60 search results - page 8 / 12
» SiteRank-Based Crawling Ordering Strategy for Search Engines
Sort
View
PVLDB
2008
141views more  PVLDB 2008»
13 years 7 months ago
WebTables: exploring the power of tables on the web
The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from...
Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wa...
KDD
2008
ACM
183views Data Mining» more  KDD 2008»
14 years 8 months ago
De-duping URLs via rewrite rules
A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
Anirban Dasgupta, Ravi Kumar, Amit Sasturkar
ICCBR
2005
Springer
14 years 1 months ago
Advertising Strategies: Learning Competence through Cooperative Game Playing
In this paper we consider the competition on the Internet between information providers to maximise their exposure to a relevant audience. Spammers and Search engine gamers adopt a...
Paolo Avesani, Conor Hayes
ADBIS
2001
Springer
114views Database» more  ADBIS 2001»
14 years 6 days ago
Evaluation of Join Strategies for Distributed Mediation
Three join algorithms are evaluated in an environment with distributed main-memory based mediators and data sources. A streamed ship-out join ships bulks of tuples to a mediator ne...
Vanja Josifovski, Timour Katchaounov, Tore Risch
WSDM
2012
ACM
243views Data Mining» more  WSDM 2012»
12 years 3 months ago
No search result left behind: branching behavior with browser tabs
Today’s Web browsers allow users to open links in new windows or tabs. This action, which we call ‘branching’, is sometimes performed on search results when the user plans t...
Jeff Huang, Thomas Lin, Ryen W. White