Context-based adaptive entropy coding is an essential feature of modern image compression algorithms; however, the design of these coders is non-trivial due to the balance that must be struck between the benefits associated with using a large number of conditioning classes, or contexts, and the penalties resulting from data dilution. The problem is especially severe when coding small sub-images where the amount of data available is small. In this paper, we propose an iterative algorithm that begins with a large number of conditioning classes and then uses a clustering procedure to reduce this number to a desired value. This method is in contrast to the more usual approach of defining contexts in an ad-hoc manner. Experiments are conducted on synthetic data sources having varying amounts of memory, as well as on the sub-images resulting from a wavelet decomposition of an image. The results show that our approach to context selection is effective and that the algorithm automatically lea...