Active data clustering is a novel technique for clustering of proximity data which utilizes principles from sequential experiment design in order to interleave data generation and data analysis. The proposed active data sampling strategy is based on the expected value of information,aconcept rooting in statistical decision theory. This is considered to be an important step towards the analysis of largescale data sets, because it o ers a way to overcome the inherent data sparseness ofproximitydata. We present applications to unsupervised texture segmentation in computer vision and information retrieval in document databases.
Thomas Hofmann, Joachim M. Buhmann