Sciweavers

ICDE
2007
IEEE

Organizing Hidden-Web Databases by Clustering Visible Web Documents

15 years 1 months ago
Organizing Hidden-Web Databases by Clustering Visible Web Documents
In this paper we address the problem of organizing hidden-Web databases. Given a heterogeneous set of Web forms that serve as entry points to hidden-Web databases, our goal is to cluster the forms according to the database domains to which they belong. We propose a new clustering approach that models Web forms as a set of hyperlinked objects and considers visible information in the form context-both within and in the neighborhood of forms--as the basis for similarity comparison. Since the clustering is performed over features that can be automatically extracted, the process is scalable. In addition, because it uses a rich set of metadata, our approach is able to handle a wide range of forms, including content-rich forms that contain multiple attributes, as well as simple keyword-based search interfaces. An experimental evaluation over real Web data shows that our strategy generates high-quality clusters--measured both in terms of entropy and F-measure. This indicates that our approach...
Luciano Barbosa, Juliana Freire, Altigran Soares d
Added 01 Nov 2009
Updated 01 Nov 2009
Type Conference
Year 2007
Where ICDE
Authors Luciano Barbosa, Juliana Freire, Altigran Soares da Silva
Comments (0)