Clustering is often formulated as a discrete optimization problem. The objective is to find, among all partitions of the data set, the best one according to some quality measure....
Most clustering algorithms produce a single clustering for a given data set even when the data can be clustered naturally in multiple ways. In this paper, we address the difficult...
In this paper, we describe a system that divides example sentences (data set) into clusters, based on the meaning of the target word, using a semi-supervised clustering technique....
Spectral clustering is a powerful clustering method for document data set. However, spectral clustering needs to solve an eigenvalue problem of the matrix converted from the simil...
Dimension reduction techniques have been successfully applied to face recognition and text information retrieval. The process can be time-consuming when the data set is large. Thi...
This paper addresses the application of a PCA analysis on categorical data prior to diagnose a patients data set using a Case-Based Reasoning (CBR) system. The particularity is th...
We present an algorithm which partitions a data set in two parts with equal size and experimentally nearly the same distribution measured through the likelihood of a Parzen kernel ...
Many approaches to active learning involve periodically training one classifier and choosing data points with the lowest confidence. An alternative approach is to periodically cho...
We describe a methodology for retrieving document images from large extremely diverse collections. First we perform content extraction, that is the location and measurement of reg...
Discourse phenomena play a major role in text processing tasks. However, so far relatively little study has been devoted to the relevance of discourse phenomena for inference. The...
Azad Abad, Luisa Bentivogli, Ido Dagan, Danilo Gia...