Sciweavers

299 search results - page 8 / 60
» User-centric Web crawling
Sort
View
WWW
2007
ACM
14 years 8 months ago
Detecting near-duplicates for web crawling
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma
JUCS
2008
124views more  JUCS 2008»
13 years 7 months ago
Structure-Based Crawling in the Hidden Web
: The number of applications that need to crawl the Web to gather data is growing at an ever increasing pace. In some cases, the criterion to determine what pages must be included ...
Márcio L. A. Vidal, Altigran Soares da Silv...
STOC
2002
ACM
95views Algorithms» more  STOC 2002»
14 years 7 months ago
Crawling on web graphs
Colin Cooper, Alan M. Frieze
WWW
2005
ACM
14 years 8 months ago
User-centric Web crawling
Search engines are the primary gateways of information access on the Web today. Behind the scenes, search engines crawl the Web to populate a local indexed repository of Web pages...
Sandeep Pandey, Christopher Olston
SAC
2003
ACM
14 years 21 days ago
Ontology-Focused Crawling of Web Documents
The Web, the largest unstructured database of the world, has greatly improved access to documents. However, documents on the Web are largely disorganized. Due to the distributed n...
Marc Ehrig, Alexander Maedche