News Page Discovery Policy for Instant Crawlers

14 years 7 months ago

Download www.sogou.com

Many news pages which are of high freshness requirement are published on the internet every day. They should be downloaded immediately by instant crawlers. Otherwise, they will become outdated soon. In the past, instant crawlers only download pages from a manually generated news website list. Bandwidth is wasted in downloading non-news pages because news websites do not publish news pages exclusively. In this paper, a novel approach is proposed to discover news pages. This approach includes seed selection and news URL prediction based on user behavior analysis. Empirical studies on a user access log for two months show that our approach outperforms the traditional approach in both precision and recall.

Yong Wang, Yiqun Liu, Min Zhang, Shaoping Ma

Real-time Traffic

AIRS 2008 | Approach Includes Seed | Information Retrieval | Instant Crawlers | User Access Log |

claim paper

Post Info
More Details (n/a)

Added	01 Jun 2010
Updated	01 Jun 2010
Type	Conference
Year	2008
Where	AIRS
Authors	Yong Wang, Yiqun Liu, Min Zhang, Shaoping Ma

Comments (0)

Sciweavers

News Page Discovery Policy for Instant Crawlers

AIRS 2008 | Approach Includes Seed | Information Retrieval | Instant Crawlers | User Access Log |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers