This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...
Clustering aims at extracting hidden structure in dataset. While the problem of finding compact clusters has been widely studied in the literature, extracting arbitrarily formed ...
Clustering algorithms are routinely used in biomedical disciplines, and are a basic tool in bioinformatics. Depending on the task at hand, there are two most popular options, the ...
In this paper, a mean shift-based clustering algorithm is proposed. The mean shift is a kernel-type weighted mean procedure. Herein, we first discuss three classes of Gaussian, C...
Abstract: The thematic text segmentation task consists in identifying the most important thematic breaks in a document in order to cut it into homogeneous passages. We propose in t...
Sylvain Lamprier, Tassadit Amghar, Bernard Levrat,...