Sciweavers

288 search results - page 5 / 58
» Crawling, Indexing, and Similarity Searching Images on the W...
Sort
View
WWW
2005
ACM
14 years 10 months ago
LSH forest: self-tuning indexes for similarity search
We consider the problem of indexing high-dimensional data for answering (approximate) similarity-search queries. Similarity indexes prove to be important in a wide variety of sett...
Mayank Bawa, Tyson Condie, Prasanna Ganesan
ICAPR
2005
Springer
14 years 3 months ago
Combining Text and Link Analysis for Focused Crawling
The number of vertical search engines and portals has rapidly increased over the last years, making the importance of a topic-driven (focused) crawler evident. In this paper, we de...
George Almpanidis, Constantine Kotropoulos
WWW
2005
ACM
14 years 10 months ago
User-centric Web crawling
Search engines are the primary gateways of information access on the Web today. Behind the scenes, search engines crawl the Web to populate a local indexed repository of Web pages...
Sandeep Pandey, Christopher Olston
PVLDB
2008
124views more  PVLDB 2008»
13 years 9 months ago
Google's Deep Web crawl
The Deep Web, i.e., content hidden behind HTML forms, has long been acknowledged as a significant gap in search engine coverage. Since it represents a large portion of the structu...
Jayant Madhavan, David Ko, Lucja Kot, Vignesh Gana...
WWW
2007
ACM
14 years 10 months ago
Detecting near-duplicates for web crawling
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma