In semi-supervised clustering, domain knowledge can be converted to constraints and used to guide the clustering. In this paper we propose a feature selection algorithm for semi-s...
Many real-world datasets can be clustered along multiple dimensions. For example, text documents can be clustered not only by topic, but also by the author's gender or sentim...
A major difficulty for designing a document image segmentation methodology is the proper value selection for all involved parameters. This is usually done after experimentations o...
This paper presents an adaptative algorithm for the segmentation of color images suited for document image analysis. The algorithm is based on a serialization of the k-means algor...
In this paper we are interested in describing Web pages by how users interact within their contents. Thus, an alternate but complementary way of labelling and classifying Web docu...