Sciweavers

299 search results - page 46 / 60
» User-centric Web crawling
Sort
View
CN
2006
78views more  CN 2006»
13 years 8 months ago
A short walk in the Blogistan
The increasingly prominent new subset of Web pages, called `blogs' differs from traditional Web pages both in characteristics and potential to applications. We explore three ...
Edith Cohen, Balachander Krishnamurthy
WWW
2005
ACM
14 years 9 months ago
Analyzing online discussion for marketing intelligence
We present a system that gathers and analyzes online discussion as it relates to consumer products. Weblogs and online message boards provide forums that record the voice of the p...
Natalie S. Glance, Matthew Hurst, Kamal Nigam, Mat...
WSDM
2010
ACM
204views Data Mining» more  WSDM 2010»
14 years 3 months ago
Learning URL patterns for webpage de-duplication
Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...
WWW
2009
ACM
14 years 9 months ago
Triplify: light-weight linked data publication from relational databases
In this paper we present Triplify ? a simplistic but effective approach to publish Linked Data from relational databases. Triplify is based on mapping HTTP-URI requests onto relat...
Sören Auer, Sebastian Dietzold, Jens Lehmann,...
ERCIMDL
2005
Springer
113views Education» more  ERCIMDL 2005»
14 years 2 months ago
mod_oai: An Apache Module for Metadata Harvesting
We describe mod_oai, an Apache 2.0 module that implements the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). The OAI-PMH is the de facto standard for metadata...
Michael L. Nelson, Herbert Van de Sompel, Xiaoming...