The bag-of-words approach has become increasingly attractive in the fields of object category recognition and scene classification, witnessed by some successful applications [5, 7, 11]. Its basic idea is to quantize an image using visual terms and exploit the image-level statistics for classification. However, the previous work still lacks the capability of modeling the spatial dependency and the correspondence between patches and object parts. Moreover, quantization always deteriorates the descriptive power of the patch feature. This paper proposes the hidden maximum entropy (HME) approach for modeling the object category. Each object is modeled by the parts, each having a Gaussian distribution. The spatial dependency and imagelevel statistics of parts are modeled through the maximum entropy approach. The model is learned by an EM-IIS (Expectation maximum embedded with improved iterative scaling) algorithm. Our experiments on the Caltech 101 dataset show that the relative reduction o...