: In a database with categorical attributes, each attribute defines a partition whose classes can be regarded as natural clusters of rows. In this paper we focus on finding a partition of the rows of a given database, that is as close as possible to the partitions associated to each attribute. We evaluate the closeness of two partitions by using a generalization of the classical conditional entropy. From this perspective, we wish to construct a partition (referred to as the median partition) such that the sum of the dissimilarities between this partition and all the partitions determined by the attributes of the database is minimal. Then, the problem of finding the median partition is an optimization problem, over the space of all partitions of the rows of the database, for which we give an approximative solution. To search more efficiently the large space of possible partitions we use a genetic algorithm where the partitions are represented by chromosomes. Our genetic algorithm obtain...
Dana Cristofor, Dan A. Simovici