Sciweavers

139 search results - page 14 / 28
» An Approach to Identify Duplicated Web Pages
Sort
View
WWW
2007
ACM
14 years 8 months ago
Csurf: a context-driven non-visual web-browser
Web sites are designed for graphical mode of interaction. Sighted users can "cut to the chase" and quickly identify relevant information in Web pages. On the contrary, i...
Jalal Mahmud, Yevgen Borodin, I. V. Ramakrishnan
ER
2010
Springer
90views Database» more  ER 2010»
13 years 6 months ago
W-Ray: A Strategy to Publish Deep Web Geographic Data
Abstract. This paper introduces an approach to address the problem of accessing conventional and geographic data from the Deep Web. The approach relies on describing the relevant d...
Helena Piccinini, Melissa Lemos, Marco A. Casanova...
WSDM
2010
ACM
204views Data Mining» more  WSDM 2010»
14 years 2 months ago
Learning URL patterns for webpage de-duplication
Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...
WISE
2009
Springer
14 years 2 months ago
Recommending Improvements to Web Applications Using Quality-Driven Heuristic Search
Planning out maintenance tasks to increase the quality of Web applications can be difficult for a manager. First, it is hard to evaluate the precise effect of a task on quality. S...
Stéphane Vaucher, Samuel Boclinville, Houar...
WWW
2008
ACM
14 years 8 months ago
As we may perceive: finding the boundaries of compound documents on the web
This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...
Pavel Dmitriev