Search Sciweavers | Sciweavers

131

ICWSM
2010

124views Internet Technology» more ICWSM 2010»

Coping With Noise in a Real-World Weblog Crawler and Retrieval System

15 years 1 months ago

In this paper we examine the effects of noise when creating a real-world weblog corpus for information retrieval. We focus on the DiffPost (Lee et al. 2008) approach to noise remo...

James Lanagan, Paul Ferguson, Neil O'Hare, Alan F....

claim paper

Read More »

103

Voted

ECIR
2009
Springer

134views Information Technology» more ECIR 2009»

Quality-Oriented Search for Depression Portals

16 years 15 days ago

Download david-hawking.net

The problem of low-quality information on the Web is nowhere more important than in the domain of health, where unsound information and misleading advice can have serious consequen...

Thanh Tin Tang, David Hawking, Ramesh S. Sankarana...

claim paper

Read More »

121

Voted

NSDI
2010

194views Computer Networks» more NSDI 2010»

The Architecture and Implementation of an Extensible Web Crawler

15 years 4 months ago

Download www.usenix.org

Many Web services operate their own Web crawlers to discover data of interest, despite the fact that largescale, timely crawling is complex, operationally intensive, and expensive...

Jonathan M. Hsieh, Steven D. Gribble, Henry M. Lev...

claim paper

Read More »

140

Voted

NIPS
2000

155views Information Technology» more NIPS 2000»

The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity

15 years 4 months ago

Download www.cs.cmu.edu

We describe a joint probabilistic model for modeling the contents and inter-connectivity of document collections such as sets of web pages or research paper archives. The model is...

David A. Cohn, Thomas Hofmann

claim paper

Read More »

130

Voted

WWW
2010
ACM

220views Internet Technology» more WWW 2010»

Not so creepy crawler: easy crawler generation with standard xml queries

15 years 10 months ago

Download www2.pms.ifi.lmu.de

Web crawlers are increasingly used for focused tasks such as the extraction of data from Wikipedia or the analysis of social networks like last.fm. In these cases, pages are far m...

Franziska von dem Bussche, Klara A. Weiand, Benedi...

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers