Finding Potential Seeds through Rank Aggregation of Web Searches

14 years 10 months ago

Download www.idi.ntnu.no

This paper presents a potential seed selection algorithm for web crawlers using a gain - share scoring approach. Initially we consider a set of arbitrarily chosen tourism queries. Each query is given to the selected N commercial Search Engines (SEs); top m search results for each SE are obtained, and each of these m results is manually evaluated and assigned a relevance score. For each of m results, a gain - share score is computed using their hyperlinks structure across N ranked lists. Gain score of each link present in each of m results and a portion of the gain score is propagated to the share score of each of m results. This updated share scores of each of m results determine the potential set of seed URLs for web crawling. Experimental results on tourism related web data illustrate the effectiveness of the proposed seed selection algorithm.

Rajendra Prasath, Pinar Öztürk

Real-time Traffic

Commercial Search Engines | Pattern Recognition | PREMI 2011 | Relevance Score | Selection Algorithm |

claim paper

» Targeting Sentiment Expressions through Supervised Ranking of Linguistic Configurations

» Wisdom of the ages toward delivering the childrens web with the linkbased agerank algorith...

Post Info
More Details (n/a)

Added	17 Sep 2011
Updated	17 Sep 2011
Type	Journal
Year	2011
Where	PREMI
Authors	Rajendra Prasath, Pinar Öztürk

Comments (0)

Sciweavers

Finding Potential Seeds through Rank Aggregation of Web Searches

Commercial Search Engines | Pattern Recognition | PREMI 2011 | Relevance Score | Selection Algorithm |

Explore & Download

Productivity Tools

Sciweavers