Sciweavers

92 search results - page 17 / 19
» HTML Pattern Generator--Automatic Data Extraction from Web P...
Sort
View
IMC
2006
ACM
14 years 1 months ago
Web search clickstreams
Search engines are a vital part of the Web and thus the Internet infrastructure. Therefore understanding the behavior of users searching the Web gives insights into trends, and en...
Nils Kammenhuber, Julia Luxenburger, Anja Feldmann...
CIDR
2003
125views Algorithms» more  CIDR 2003»
13 years 8 months ago
Crossing the Structure Chasm
It has frequently been observed that most of the world’s data lies outside database systems. The reason is that database systems focus on structured data, leaving the unstructur...
Alon Y. Halevy, Oren Etzioni, AnHai Doan, Zachary ...
WSDM
2010
ACM
204views Data Mining» more  WSDM 2010»
14 years 2 months ago
Learning URL patterns for webpage de-duplication
Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...
BTW
2003
Springer
103views Database» more  BTW 2003»
14 years 20 days ago
XPath-Aware Chunking of XML-Documents
Dissemination systems are used to route information received from many publishers individually to multiple subscribers. The core of a dissemination system consists of an efficient...
Wolfgang Lehner, Florian Irmert
AI
2005
Springer
13 years 9 months ago
Integrating Web Content Clustering into Web Log Association Rule Mining
Abstract. One of the effects of the general Internet growth is an immense number of user accesses to WWW resources. These accesses are recorded in the web server log files, which...
Jiayun Guo, Vlado Keselj, Qigang Gao