Clustering can be defined as a data assignment problem where the goal is to partition the data into nonhierarchical groups of items. In our previous work, we suggested an information-theoretic criterion, based on the minimum description length (MDL) principle, for defining the goodness of a clustering of data. The basic idea behind this framework is to optimize the total code length over the data by encoding together data items belonging to the same cluster. In this setting efficient coding is possible only by exploiting underlying regularities that are common to the members of a cluster, which means that this approach produces an implicitly defined similarity metric between the data items. Formally the global code length criterion to be optimized is defined by using the intuitively appealing universal normalized maximum likelihood (NML) code which has been shown to produce optimal code lengths in the worst case sense. In this paper, we focus on the optimization aspect of the clusterin...