Sciweavers

1275 search results - page 6 / 255
» A Simple Focused Crawler
Sort
View
NSDI
2010
13 years 9 months ago
The Architecture and Implementation of an Extensible Web Crawler
Many Web services operate their own Web crawlers to discover data of interest, despite the fact that largescale, timely crawling is complex, operationally intensive, and expensive...
Jonathan M. Hsieh, Steven D. Gribble, Henry M. Lev...
ICWSM
2010
13 years 6 months ago
Coping With Noise in a Real-World Weblog Crawler and Retrieval System
In this paper we examine the effects of noise when creating a real-world weblog corpus for information retrieval. We focus on the DiffPost (Lee et al. 2008) approach to noise remo...
James Lanagan, Paul Ferguson, Neil O'Hare, Alan F....
VLDB
2000
ACM
125views Database» more  VLDB 2000»
13 years 11 months ago
Focused Crawling Using Context Graphs
Maintaining currency of search engine indices by exhaustive crawling is rapidly becoming impossible due to the increasing size and dynamic content of the web. Focused crawlers aim...
Michelangelo Diligenti, Frans Coetzee, Steve Lawre...
ICML
2007
IEEE
14 years 8 months ago
Focused crawling with scalable ordinal regression solvers
In this paper we propose a novel, scalable, clustering based Ordinal Regression formulation, which is an instance of a Second Order Cone Program (SOCP) with one Second Order Cone ...
Rashmin Babaria, J. Saketha Nath, S. Krishnan, K. ...
WIDM
2004
ACM
14 years 1 months ago
Probabilistic models for focused web crawling
A Focused crawler must use information gleaned from previously crawled page sequences to estimate the relevance of a newly seen URL. Therefore, good performance depends on powerfu...
Hongyu Liu, Evangelos E. Milios, Jeannette Janssen