Recently, the bag-of-words approach has been successfully applied to automatic image annotation, object recognition, etc. The method needs to first quantize an image using the visual terms and then extract the image-level statistics for classification. Although successful applications have been reported, it lacks the capability to model the spatial dependency and the correspondence between the patches and visual parts. Moreover, quantization deteriorates the descriptive power of patch feature. This paper proposes the hidden maximum entropy (HME) approach for modeling visual concepts. Each concept is composed of a set of visual parts, each part having a Gaussian distribution. The spatial dependency and image-level statistics of parts are modeled through the maximum entropy. The model is learned using the developed EM-IIS algorithm. We report the preliminary results on the 260 concepts in the Corel dataset and compared with the maximum entropy (ME) approach. Our experiments on concept d...