Sciweavers

178 search results - page 19 / 36
» Scheduling Algorithms for Web Crawling
Sort
View
WSDM
2010
ACM
204views Data Mining» more  WSDM 2010»
14 years 2 months ago
Learning URL patterns for webpage de-duplication
Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...
WWW
2007
ACM
14 years 8 months ago
Extraction and classification of dense communities in the web
The World Wide Web (WWW) is rapidly becoming important for society as a medium for sharing data, information and services, and there is a growing interest in tools for understandi...
Yon Dourisboure, Filippo Geraci, Marco Pellegrini
WWW
2009
ACM
14 years 8 months ago
Triplify: light-weight linked data publication from relational databases
In this paper we present Triplify ? a simplistic but effective approach to publish Linked Data from relational databases. Triplify is based on mapping HTTP-URI requests onto relat...
Sören Auer, Sebastian Dietzold, Jens Lehmann,...
IPPS
2005
IEEE
14 years 1 months ago
QoS Aware Job Scheduling in a Cluster-Based Web Server for Multimedia Applications
We propose a cluster-based web server where a few computing nodes are separately reserved for high-performance computing applications, such as multimedia, SSL, and CGI. As an exam...
Jiani Guo, Laxmi N. Bhuyan, Raj Kumar, Sujoy Basu
WWW
2007
ACM
14 years 8 months ago
The discoverability of the web
Previous studies have highlighted the high arrival rate of new content on the web. We study the extent to which this new content can be efficiently discovered by a crawler. Our st...
Anirban Dasgupta, Arpita Ghosh, Ravi Kumar, Christ...