Sciweavers

295 search results - page 49 / 59
» Web Crawling
Sort
View
KDD
2008
ACM
183views Data Mining» more  KDD 2008»
14 years 10 months ago
De-duping URLs via rewrite rules
A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
Anirban Dasgupta, Ravi Kumar, Amit Sasturkar
SIGIR
2005
ACM
14 years 3 months ago
Server selection methods in hybrid portal search
The TREC .GOV collection makes a valuable web testbed for distributed information retrieval methods because it is naturally partitioned and includes 725 web-oriented queries with ...
David Hawking, Paul Thomas
WSDM
2010
ACM
251views Data Mining» more  WSDM 2010»
14 years 7 months ago
Large Scale Query Log Analysis of Re-Finding
Although Web search engines are targeted towards helping people find new information, people regularly use them to re-find Web pages they have seen before. Researchers have noted ...
Jaime Teevan, Sarah K. Tyler
IPM
2007
156views more  IPM 2007»
13 years 9 months ago
p2pDating: Real life inspired semantic overlay networks for Web search
We consider a network of autonomous peers forming a logically global but physically distributed search engine, where every peer has its own local collection generated by independe...
Josiane Xavier Parreira, Sebastian Michel, Gerhard...
WWW
2004
ACM
14 years 10 months ago
Combining link and content analysis to estimate semantic similarity
Search engines use content and link information to crawl, index, retrieve, and rank Web pages. The correlations between similarity measures based on these cues and on semantic ass...
Filippo Menczer