In many clustering applications, the user has some vague notion of the number and membership of the desired clusters. However, it is difficult for the user to provide such knowledge explicitly in the clustering process. We propose a solution to circumvent this difficulty by introducing a feedback mechanism. The notion of Bayesian inference for relevance feedback in content-based image retrieval is modified for data clustering. Given the number of clusters, the proposed algorithm seeks information about the target partition by asking the user a sequence of queries about whether a pair of objects should be put in the same cluster or not. Information-theoretic criteria is adopted to select the queries to be presented to the user. The assumption made here is that cluster labels are "smooth", i.e., similar objects should share the same cluster labels. We show that it is possible to obtain reasonable partitions based on the user feedback alone, without the need of specifying a clu...
Anil K. Jain, Pavan Kumar Mallapragada, Martin H.