Search Sciweavers | Sciweavers

299 search results - page 8 / 60

» User-centric Web crawling

162

Voted

WWW
2007
ACM

162views Internet Technology» more WWW 2007»

Detecting near-duplicates for web crawling

16 years 7 months ago

Download infolab.stanford.edu

Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...

Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma

claim paper

Read More »

152

click to vote

JUCS
2008

124views more JUCS 2008»

Structure-Based Crawling in the Hidden Web

15 years 6 months ago

Download www.jucs.org

: The number of applications that need to crawl the Web to gather data is growing at an ever increasing pace. In some cases, the criterion to determine what pages must be included ...

Márcio L. A. Vidal, Altigran Soares da Silv...

claim paper

Read More »

173

click to vote

STOC
2002
ACM

95views Algorithms» more STOC 2002»

Crawling on web graphs

16 years 7 months ago

Download www.math.cmu.edu

Colin Cooper, Alan M. Frieze

claim paper

Read More »

205

Voted

WWW
2005
ACM

151views Internet Technology» more WWW 2005»

User-centric Web crawling

16 years 7 months ago

Download www2005.org

Search engines are the primary gateways of information access on the Web today. Behind the scenes, search engines crawl the Web to populate a local indexed repository of Web pages...

Sandeep Pandey, Christopher Olston

claim paper

Read More »

171

click to vote

SAC
2003
ACM

133views Applied Computing» more SAC 2003»

Ontology-Focused Crawling of Web Documents

15 years 12 months ago

Download dspc11.cs.ccu.edu.tw

The Web, the largest unstructured database of the world, has greatly improved access to documents. However, documents on the Web are largely disorganized. Due to the distributed n...

Marc Ehrig, Alexander Maedche

claim paper

Read More »

« Prev « First page 8 / 60 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers