An iterative model selection algorithm is proposed. The algorithm seeks relevant features and an optimal number of codewords (or codebook size) as part of the optimization. We use a well-known separability measure to perform feature selection, and we use a Lagrangian with entropy and codebook size constraints to find the optimal number of codewords. We add two model selection steps to the quantization process: one for feature selection and the other for choosing the number of clusters. Once relevant and irrelevant features are identified, we also estimate the probability density function of irrelevant features instead of discarding them. This can avoid the bias of problem of the separability measure favoring high dimensional spaces.
Sangho Yoon, Robert M. Gray