The performance of traditional image retrieval approaches remains unsatisfactory, as they are restricted by the wellknown semantic gap and the diversity of textual semantics. To tackle these problems, we propose an improved image retrieval framework when querying with an image. The framework considers not only the discriminative power of various visual properties but also the semantic representation of the query image. Given a query image, we first perform CBIR to obtain some visually similar image sets corresponding to different visual properties separately. Then, a semantic representation to the query image is learnt from each image set. The semantic consistence among the textual indexes of each image set is measured in order to judge the confidence of various visual properties and the obtained semantic representation in search. Obtaining these items, both visually and semantically relevant images are returned to the user by a combined similarity measure. Experiments on a large-scal...