Sciweavers

102 search results - page 15 / 21
» Agent-Based Approach for Web Crawling
Sort
View
WIDM
2006
ACM
14 years 2 months ago
Coarse-grained classification of web sites by their structural properties
In this paper, we identify and analyze structural properties which reflect the functionality of a Web site. These structural properties consider the size, the organization, the co...
Christoph Lindemann, Lars Littig
WSDM
2010
ACM
204views Data Mining» more  WSDM 2010»
14 years 2 months ago
Learning URL patterns for webpage de-duplication
Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...
WWW
2008
ACM
14 years 8 months ago
Recrawl scheduling based on information longevity
It is crucial for a web crawler to distinguish between ephemeral and persistent content. Ephemeral content (e.g., quote of the day) is usually not worth crawling, because by the t...
Christopher Olston, Sandeep Pandey
WWW
2005
ACM
14 years 8 months ago
Analyzing online discussion for marketing intelligence
We present a system that gathers and analyzes online discussion as it relates to consumer products. Weblogs and online message boards provide forums that record the voice of the p...
Natalie S. Glance, Matthew Hurst, Kamal Nigam, Mat...
WWW
2007
ACM
14 years 8 months ago
Efficient search in large textual collections with redundancy
Current web search engines focus on searching only the most recent snapshot of the web. In some cases, however, it would be desirable to search over collections that include many ...
Jiangong Zhang, Torsten Suel