Automatic image annotation is a promising solution to enable more effective image retrieval by keywords. Traditionally, statistical models for image auto-annotation predicate each annotated keyword independently without considering the correlation of words. In this paper, we propose a novel probability model, in which the correspondence between keywords and image visual tokens/regions and the word-to-word correlation are well combined. We employ the conditional probability to express two kinds of correlation uniformly and obtain the correspondence between keyword and visual feature with the cross-media relevance model (CMRM). Experiments conducted on standard Corel dataset demonstrate the effectiveness of the proposed method for image automatic annotation.