Focused crawling for both topical relevance and quality of medical information

14 years 9 months ago

Download research.microsoft.com

Subject-speciﬁc search facilities on health sites are usually built using manual inclusion and exclusion rules. These can be expensive to maintain and often provide incomplete coverage of Web resources. On the other hand, health information obtained through whole-of-Web search may not be scientiﬁcally based and can be potentially harmful. To address problems of cost, coverage and quality, we built a focused crawler for the mental health topic of depression, which was able to selectively fetch higher quality relevant information. We found that the relevance of unfetched pages can be predicted based on link anchor context, but the quality cannot. We therefore estimated quality of the entire linking page, using a learned IR-style query of weighted single words and word pairs, and used this to predict the quality of its links. The overall crawler priority was determined by the product of link relevance and source quality. We evaluated our crawler against baseline crawls using both rel...

Thanh Tin Tang, David Hawking, Nick Craswell, Kath

Real-time Traffic

CIKM 2005 | Focused Crawler | Quality Focused Crawler | Relevance Focused Crawler |

claim paper

Post Info
More Details (n/a)

Added	26 Jun 2010
Updated	26 Jun 2010
Type	Conference
Year	2005
Where	CIKM
Authors	Thanh Tin Tang, David Hawking, Nick Craswell, Kathleen Griffiths

Comments (0)

Sciweavers

Focused crawling for both topical relevance and quality of medical information

CIKM 2005 | Focused Crawler | Quality Focused Crawler | Relevance Focused Crawler |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers