Detection of template and noise blocks in web pages is an important step in improving the performance of information retrieval and content extraction. Of the many approaches propos...
In this paper we present an improved version of the Probabilistic Ant based Clustering Algorithm for Distributed Databases (PACE). The most important feature of this algorithm is ...
In this paper, we present a model to obtain and analyze user profiles after a process of web usage mining where log files are processed. The web log files register the activity of...
Fully automatic methods that extract lists of objects from the Web have been studied extensively. Record extraction, the first step of this object extraction process, identifies...
When automatically extracting information from the world wide web, most established methods focus on spotting single HTMLdocuments. However, the problem of spotting complete web s...
Martin Ester, Hans-Peter Kriegel, Matthias Schuber...