Sciweavers

72 search results - page 2 / 15
» Ontology-Focused Crawling of Web Documents
Sort
View
VLDB
2000
ACM
125views Database» more  VLDB 2000»
13 years 10 months ago
Focused Crawling Using Context Graphs
Maintaining currency of search engine indices by exhaustive crawling is rapidly becoming impossible due to the increasing size and dynamic content of the web. Focused crawlers aim...
Michelangelo Diligenti, Frans Coetzee, Steve Lawre...
ICDM
2008
IEEE
186views Data Mining» more  ICDM 2008»
14 years 1 months ago
xCrawl: A High-Recall Crawling Method for Web Mining
Web Mining Systems exploit the redundancy of data published on the Web to automatically extract information from existing web documents. The first step in the Information Extract...
Kostyantyn M. Shchekotykhin, Dietmar Jannach, Gerh...
WWW
2003
ACM
14 years 7 months ago
Distributed Indexing of the Web Using Migrating Crawlers
Due to the tremendous increase rate and the high change frequency of Web documents, maintaining an up-to-date index for searching purposes (search engines) is becoming a challenge...
Odysseas Papapetrou, Stavros Papastavrou, George S...
ADMA
2009
Springer
142views Data Mining» more  ADMA 2009»
14 years 1 months ago
Crawling Deep Web Using a New Set Covering Algorithm
Abstract. Crawling the deep web often requires the selection of an appropriate set of queries so that they can cover most of the documents in the data source with low cost. This ca...
Yan Wang, Jianguo Lu, Jessica Chen
WWW
2007
ACM
14 years 7 months ago
Detecting near-duplicates for web crawling
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma