Sciweavers

139 search results - page 6 / 28
» An Approach to Identify Duplicated Web Pages
Sort
View
ICAIL
2007
ACM
13 years 11 months ago
Essential deduplication functions for transactional databases in law firms
As massive document repositories and knowledge management systems continue to expand, in proprietary environments as well as on the Web, the need for duplicate detection becomes i...
Jack G. Conrad, Edward L. Raymond
SIGIR
2004
ACM
14 years 1 months ago
Constructing a text corpus for inexact duplicate detection
As online document collections continue to expand, both on the Web and in proprietary environments, the need for duplicate detection becomes more critical. The goal of this work i...
Jack G. Conrad, Cindy P. Schriber
CIKM
2009
Springer
14 years 2 months ago
Automatic generation of topic pages using query-based aspect models
We investigate the automatic generation of topic pages as an alternative to the current Web search paradigm. We describe a general framework, which combines query log analysis to ...
Niranjan Balasubramanian, Silviu Cucerzan
WWW
2004
ACM
14 years 8 months ago
Matching web site structure and content
To keep an overview of a complex corporate web sites, it is crucial to understand the relationship of contents, structure and the user's behavior. In this paper, we describe ...
Vassil Gedov, Carsten Stolz, Ralph Neuneier, Micha...
WWW
2005
ACM
14 years 1 months ago
Finding the boundaries of information resources on the web
In recent years, many algorithms for the Web have been developed that work with information units distinct from individual web pages. These include segments of web pages or aggreg...
Pavel Dmitriev, Carl Lagoze, Boris Suchkov