Search Sciweavers | Sciweavers

102 search results - page 15 / 21

» Agent-Based Approach for Web Crawling

127

click to vote

WIDM
2006
ACM

148views Internet Technology» more WIDM 2006»

Coarse-grained classification of web sites by their structural properties

15 years 9 months ago

Download rvs.informatik.uni-leipzig.de

In this paper, we identify and analyze structural properties which reflect the functionality of a Web site. These structural properties consider the size, the organization, the co...

Christoph Lindemann, Lars Littig

claim paper

Read More »

124

click to vote

WSDM
2010
ACM

204views Data Mining» more WSDM 2010»

Learning URL patterns for webpage de-duplication

15 years 10 months ago

Download www.wsdm-conference.org

Presence of duplicate documents in the World Wide Web adversely aﬀects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...

Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...

claim paper

Read More »

101

click to vote

WWW
2008
ACM

109views Internet Technology» more WWW 2008»

Recrawl scheduling based on information longevity

16 years 3 months ago

Download www2008.org

It is crucial for a web crawler to distinguish between ephemeral and persistent content. Ephemeral content (e.g., quote of the day) is usually not worth crawling, because by the t...

Christopher Olston, Sandeep Pandey

claim paper

Read More »

138

click to vote

WWW
2005
ACM

134views Internet Technology» more WWW 2005»

Analyzing online discussion for marketing intelligence

16 years 3 months ago

Download www.kamalnigam.com

We present a system that gathers and analyzes online discussion as it relates to consumer products. Weblogs and online message boards provide forums that record the voice of the p...

Natalie S. Glance, Matthew Hurst, Kamal Nigam, Mat...

claim paper

Read More »

145

click to vote

WWW
2007
ACM

175views Internet Technology» more WWW 2007»

Efficient search in large textual collections with redundancy

16 years 3 months ago

Download www2007.org

Current web search engines focus on searching only the most recent snapshot of the web. In some cases, however, it would be desirable to search over collections that include many ...

Jiangong Zhang, Torsten Suel

claim paper

Read More »

« Prev « First page 15 / 21 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers