Search Sciweavers | Sciweavers

1463 search results - page 13 / 293

» Adaptive Focused Crawling

143

Voted

NIPS
2000

155views Information Technology» more NIPS 2000»

The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity

15 years 4 months ago

Download www.cs.cmu.edu

We describe a joint probabilistic model for modeling the contents and inter-connectivity of document collections such as sets of web pages or research paper archives. The model is...

David A. Cohn, Thomas Hofmann

claim paper

Read More »

132

Voted

WWW
2010
ACM

220views Internet Technology» more WWW 2010»

Not so creepy crawler: easy crawler generation with standard xml queries

15 years 10 months ago

Download www2.pms.ifi.lmu.de

Web crawlers are increasingly used for focused tasks such as the extraction of data from Wikipedia or the analysis of social networks like last.fm. In these cases, pages are far m...

Franziska von dem Bussche, Klara A. Weiand, Benedi...

claim paper

Read More »

131

Voted

WEBDB
2005
Springer

129views Database» more WEBDB 2005»

Searching for Hidden-Web Databases

15 years 9 months ago

Download www.cs.utah.edu

Recently, there has been increased interest in the retrieval and integration of hidden Web data with a view to leverage high-quality information available in online databases. Alt...

Luciano Barbosa, Juliana Freire

claim paper

Read More »

122

click to vote

WWW
2009
ACM

135views Internet Technology» more WWW 2009»

User-centric content freshness metrics for search engines

16 years 4 months ago

Download www2009.org

In order to return relevant search results, a search engine must keep its local repository synchronized to the Web, but it is usually impossible to attain perfect freshness. Hence...

Ali Dasdan, Xinh Huynh

claim paper

Read More »

120

Voted

WWW
2007
ACM

98views Internet Technology» more WWW 2007»

A large-scale study of robots.txt

16 years 4 months ago

Download www2007.org

Search engines largely rely on Web robots to collect information from the Web. Due to the unregulated open-access nature of the Web, robot activities are extremely diverse. Such c...

Yang Sun, Ziming Zhuang, C. Lee Giles

claim paper

Read More »

« Prev « First page 13 / 293 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers