Sciweavers

ADCS
2004

Focused Crawling in Depression Portal Search: A Feasibility Study

14 years 1 months ago
Focused Crawling in Depression Portal Search: A Feasibility Study
Previous work on domain specific search services in the area of depressive illness has documented the significant human cost required to setup and maintain closed-crawl parameters. It also showed that domain coverage is much less than that of whole-of-web search engines. Here we report on the feasibility of techniques for achieving greater coverage at lower cost. We found that acceptably effective crawl parameters could be automatically derived from a DMOZ depression category list, with dramatic saving in effort. We also found evidence that focused crawling could be effective in this domain: relevant documents from diverse sources are extensively interlinked; many outgoing links from a constrained crawl based on DMOZ lead to additional relevant content; and we were able to achieve reasonable precision (88%) and recall (68%) using a J48-derived predictive classifier operating only on URL words, anchor text and text content adjacent to referring links. Future directions include implement...
Thanh Tin Tang, David Hawking, Nick Craswell, Rame
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2004
Where ADCS
Authors Thanh Tin Tang, David Hawking, Nick Craswell, Ramesh S. Sankaranarayana
Comments (0)